31

I am reading through these notes, trying to piece together a picture of what the rules/laws are regarding Wikipedia content:

I don't have any desire or intention of doing this myself, but I am wondering if I create a site like Wikipedia, which has "freely licensed/usable content", if someone else is going to go ahead and clone my project and slap it under a new domain, change some fonts and colors, slap on a subscription fee and maybe some ads, and try and rank higher on Google Search so they become the dominant provider of my underlying content.

So Wikipedia is a parallel, how does Wikipedia prevent someone from downloading a bulk dump of the whole site, putting it under their own custom domain like freeknowledgefoo.com, slap some ads around in the pages, and add a subscription fee. Then they maybe redesign a few things slightly (changing fonts and colors), and then for whatever reason they end up ranking higher than Wikipedia itself on Google and end up being used by default instead of Wikipedia. That would be a horrible scenario, which would mean that all the work is put into Wikipedia, but all the money is made by some copycat site.

How does Wikipedia prevent that?

How could I prevent that if I am offering free data dumps of various kinds, but myself have a UI to view the data (like NYC's open data site). So you can download the dump and run the site yourself, or use my precomposed main website hosting and showing the data. How do I prevent users from just deploying my project on their own domain, slightly changing things, and then they run off with the future?

Is this what the "ShareAlike" license is for?

Creative Commons Deed

This is a human-readable summary of the full license below.

You are free:

  • to Share—to copy, distribute and transmit the work, and
  • to Remix—to adapt the work

for any purpose, even commercially.

Under the following conditions:

  • Attribution—You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work.)
  • Share Alike—If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

With the understanding that:

  • Waiver—Any of the above conditions can be waived if you get permission from the copyright holder.

  • Other Rights—In no way are any of the following rights affected by the license:

  • your fair dealing or fair use rights;

  • the author's moral rights; and

  • rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.

  • Notice—For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do that is with a link to https://creativecommons.org/licenses/by-sa/3.0/

I don't see anything saying you can't do what I describe. But maybe the fact that you have to have "attribution" is the key? I don't quite know how best to approach this situation. On one hand I would like to release data which is free and open source. On the other hand I don't want someone else to then run off with all of it and try and outcompete me with the same product I guess.

David Siegel
  • 115,406
  • 10
  • 215
  • 408
Lance Pollard
  • 789
  • 6
  • 14

5 Answers5

62

It's allowed by the Creative Commons Attribution -ShareAlike license, and intentionally so. The Wikimedia Foundation wants things like this to be possible; that is part of the goal of open content. (This license is also used on Stack Exchange content, so the same applies to e.g. this answer.)

However, it is important to remember that this is not a public-domain equivalent license. If you copy from a Wikipedia article (or an SE post), you must comply with the "Attribution" and "ShareAlike" requirements.

  • Attribution: You must give credit to the author. For Wikipedia articles, which typically have many authors, a link to the page is sufficient; editors agree to this in addition to the license when they save their edits. For Stack Exchange content, a link to the post itself should be enough. (To get a direct URL for an answer, click the "share" link below the post.) To be entirely safe, and as a courtesy, it would also be a good idea to include the poster's username.
  • ShareAlike: If you modify the content, you must release your modified version under the same or a compatible license. You can't copy this answer, add more information (or translate it into another language, or make any other change), and keep an all-rights-reserved copyright on it, or release it into the public domain; your version must also be released under CC BY-SA.

As long as you follow these requirements, copying is allowed and encouraged.

Someone
  • 17,523
  • 13
  • 96
  • 197
53

"they end up ranking higher than Wikipedia itself on Google". That's a highly unlikely scenario, unless you could somehow convince all current users of Wikipedia that switching to a paid page is better (for the same content!). Also, Google's high ranking of Wikipedia articles (together with often embedding parts of an article in its results) is hard coded and unlikely to change without the explicit interaction of a programmer at Google. As mentioned in this article, Google is a large donor to the Wikimedia Foundation, paying several million dollars a year. This was, according to the article, done to "help reduce the pagerank of widespread, uneditable Wikipedia clones that were ostensibly ad farms."

So indeed, such pages did exist and many still do. The List of Forks known to Wikipedia is almost endless. Some just clone a particular set of articles (for a particular topic) others really just wrap a new layout around the same content, adding poor navigation and advertisements. Many of the clones fail to follow the CC license requirement, giving the impression the content is their own. This is obviously a clear copyright violation.

Neinstein
  • 504
  • 4
  • 9
PMF
  • 9,285
  • 2
  • 28
  • 61
16

Wikipedia does not make any attempt to stop people creating and placing online copies of Wikipedia, or parts of it. Indeed it encourages people to do so. Not only does the CC-BY-SA 3.0 license explicitly allow this, Wikipedia provides free "dumps" on a regular Basis. Using these, anyone can copy the data behind any or all of Wikipedia's articles. It also provides free access to the software on which Wikipedia runs (MediaWiki) and instructions on how to set it up, and how to load it with data from a dump.

The legal requirements imposed are to provide attribution and to offer the content under the same (or a compatible) license, that is, the CC-BY-SA 3.0 license.

The license is simple, just include a note that the site is under the CC-BY-SA 3.0 license and a link to the text of the license. Attribution is harder, because Wikipedia articles often have many authors. But it is generally agreed that one valid way to provide attribution is to link to the history page for the source article on Wikipedia. One could also copy the list of editors from the history, and post that for each article.

Many online copies of Wikipedia do not fully comply with this license, and are infringements of copyright. But the copyright is held by the individual contributors, not the Wikimedia foundation, and few if any of them care to sue over such infringements.

I don't quite know how best to approach this situation. On one hand I would like to release data which is free and open source. On the other hand I don't want someone else to then run off with all of it and try and outcompete me with the same product I guess.

Those desires are essentially contradictory. If you post a work under a permissive license, such as the CC-BY-SA 3.0 license (or the very similar version 4.0 license which this post and all of Stack Exchange (SE) is offered under), you are saying that anyone in the world is free to copy the work for any purpose, including to sell access to it. One could use the CC-BY-SA-NC licensee which forbids commercial reuse, or some other license that imposes restrictions that seem good to the person doing the posting. But attempting to check up on copies, and get commercial ones taken down or sued would be a full-time job for a team of people, if the site becomes at all popular. Wikipedia (and SE) has chosen not to try.

David Siegel
  • 115,406
  • 10
  • 215
  • 408
3

As well as forks, there are also "mirrors". These are like forks, but update regularly to match changes in Wikipedia, so don't end up becoming nothing but a one-off snapshot of the site. Again, nothing to stop you from charging for it. You may not even get your capital investment back though, and the ongoing expenses for a mirror would be higher too, due to higher energy and data usage from continually updating to keep up with Wikipedia. Even more so if you are also saving the intermediate revisions. "Sure you can download Wikipedia, got several multi-terabyte hard drives sitting around?"

3

Yes, it can be done legally. Although creating a mirror is not that simple nowadays, as the complexity of the pages increased through the years (images, wikidata, LUA…). You would also need to keep them frequently updated to be competitive.

And, most importantly, you must correctly include the license of the content (CC-BY-SA for the text, varied for the images) and credit the authors (the author is not "Wikipedia", but each of the contributors to that page, so you need a list of the authors. It might be enough linking to the Wikipedia history… or you could need your own copy - e.g. they could delete the page you are showing!).

Then, getting to the main issue, you are too expensive. You are providing a service (website with Wikipedia content) for a fee, that is provided for free by Wikipedia itself. And I would bet that better than you (faster to load, more up to date…)

If you are offered a paid option and exactly the same one (and perhaps even better) for free, which one would you choose? :)

You would need to provide an additional value that people were willing to pay for. Perhaps, your website is available in a local community with no access to the internet. Or you printed this encyclopedia in paper form. Or you certify that your version does not contain any wrong fact. Or you made a deal with Chinese authorities so your website is available from China while Wikipedia is not.

Or a nuclear war ensued and yours is the only remaining copy of Wikipedia.

This is the same question that is sometimes raised with Free Software. You could repackage and resell for a hefty sum an image program, a web browser, or even a full operating system with little more than changing their names. But you must acknowledge who their authors are (you cannot pass it as made by you), and the right of anyone that receives it from you to give it to others for free.

So in the end the options to get money from that are generally

  • Get paid a nominal fee for the disks with the media (nobody would hardly pay more)
  • Get paid for providing support (helping people with issues, keep servers running…)
  • Get paid for developing it (there is a payment for the development of a feature, but from then on it is available to everyone for free)
David Siegel
  • 115,406
  • 10
  • 215
  • 408
Ángel
  • 1,216
  • 1
  • 9
  • 10