36

Due to the recent controversy(s) regarding StackExchange, a couple other users and I were discussing the legality of creating a copy of SE and scraping the content.

If we did not copy SE's actual code, just the content that users put on the site, and we created another public site that was completely nonprofit, and we attributed all content taken to StackExchange would it be legal? Do we need permission from every single user on SE? Do we need SE's permission?

Some relevant portions of the ToS:

You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC-BY-SA), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content


Any other downloading, copying, or storing of any public Network Content (other than Subscriber Content or content made available via the Stack Overflow API) for other than personal, noncommercial use is expressly prohibited without prior written permission from Stack Overflow or from the copyright holder identified in the copyright notice per the Creative Commons License. In the event you download software from the public Network (other than Subscriber Content or content made available by the Stack Overflow API) the software including any files, images incorporated in or generated by the software, the data accompanying the software (collectively, the “Software”) is licensed to you by Stack Overflow or third party licensors for your personal, noncommercial use, and no title to the Software shall transfer to you. Stack Overflow or third party licensors retain full and complete title to the Software and all intellectual property rights therein.

user7886229
  • 461
  • 4
  • 5

2 Answers2

26

Stack Exchange have already covered this in a couple of places, from MSE's A site (or scraper) is copying content from Stack Exchange. What should I do?:

When should I not report these sites?

  • They follow all the attribution requirements. As mentioned before, there is nothing wrong with copying our content elsewhere on the web, so long as they are following all the attribution requirements given. There is no action we can take against a scraper who follows all the rules.

And the old Attribution Required blog post mentions that the actual requirements are:

  1. Visually indicate that the content is from Stack Overflow or the Stack Exchange network in some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.
  2. Hyperlink directly to the original question on the source site (e.g., http://stackoverflow.com/questions/12345)
  3. Show the author names for every question and answer
  4. Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/1234567890/username)

By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed

-2

Intended to complement '947's excellent answer, a direct response:

Yes. No. You already have it, per CC-BY-SA. (In response to "... would it be legal? Do we need permission from every single user on SE? Do we need SE's permission?"

All of the Subscriber Content is available under a CC-BY-SA license. Also, because it was provided to SE only "pursuant to Creative Commons licensing terms", you don't have to scrape it. Because of the "-SA", SE can't use "Effective Technological Measures" (see this clause et seq.) and I would consider rate limiting of HTTP or the API, despite a request that it be lifted for the purpose, to be one - so if you want all the Subscriber Content, SE would be unwise to not give it to you if you requested it, in a mutually convenient form, such as a compressed database dump. (Like the ones Wikipedia/the WMF provides.)

P.S. Err, it looks like this info was largely already provided in a comment by @Ángel - "You need permission from every single user on SE [that posted something you are copying] and you have that permission by way of their releasing the content under CC-BY-SA. Note you should better attribute the users themselves as authors, not StackExchange itself (e.g. attribute to Law StackExchange user JBis (22305), not attribute it as if authored by SE, which doesn't)" (edited for clarity)

IANAL,BIPOOTI.