27

I am currently interested in doing some research performing various measurements and algorithms on the most common words in the English language. I have found a few good word lists online that would be suitable, but I am concerned that they may be subject to copyright and would like to be sure about their status before I use them. The lists are in the form of a simple text file with one word per line.

I understand that collections of words such as dictionaries are subject to copyright because they contain a large amount of words and their definitions which can be considered to require original work and creativity, but what about just words without any definitions or additional information?

I have seen this other Law Stack Exchange question, which mentions lists of words, but the author seems to have been interested in also using some short definitions, and I am not certain the answer refers to plain word lists.

Is a list of common words copyrightable? If so, would it still be considered fair use to use the word list to generate data not related to the words themselves?

NK1406
  • 371
  • 3
  • 8

2 Answers2

41

Depending on your jurisdiction, such lists may be protected, but not by copyright.

For example, in Germany there was a court decision that scanning all the country’s phone books and selling them on CD constituted “unfair competition” and was illegal, while hiring 1000 typists who would manually type in all this information would not be.

Databases are protected in many jurisdictions, and a list of the 1000 most commonly used English words could reasonably be called a database.

gnasher729
  • 35,915
  • 2
  • 51
  • 94
18

The words themselves are not protected by copyright, because they are "facts" (of the English Language -- also, the list-maker didn't create the words). Lists of words created by an algorithm are "facts", and lack the speck of creativity that makes web pages protected. The corpora that underlie the lists are protected, as is the program that filters them to give token counts, but the resulting table of information is not, see Feist v. Rural Telephone.

user6726
  • 217,973
  • 11
  • 354
  • 589