Where to draw the line when it comes to data Copyright law

Question

I am currently developing a fitness utility application and have reached the point where I need to populate it with a real data-set.

After quite a lot of research I found that there were literally no companies that allow their data-sets to be used openly. Now, although this is understandable I don't exactly understand how they can copyright their data when it comes to things like a fitness exercise.

For example, if we take a record from bodybuilding.com like:

{
    name: "Barbell bench press",
    type: "strength",
    main_muscle: "chest",
    other_muscles: ["shoulders", "triceps"],
    equipment: "Barbell",
    mechanics_type: "Compound",
    level: "Beginner"
    sport: "No"
    force: "Push",
    reputation: 9.2,
    description: "Lie back on a flat bench. Using a medium width grip (a grip that creates
                  a 90-degree angle in the middle of the movement between the forearms
                  and the upper arms), lift the bar from the rack and hold it straight over
                  you with your arms locked.
                  ........
                  ........
                  When you are done, place the bar back in the rack."

}

How can this data be copyrighted if just recreating it would yield the same result, given that the values are not ambiguous/subjective?

The only thing I feel could be copyrighted is the sequence of words (phrasing) in the description – please correct me if I am wrong.

If this is the case: Would it be legal for someone to use the exact same data-set, rephrasing the words in the description? and, for example, replacing beginner with amateur?

Edit 1:

Except from the description and the reputation, all the other fields, and the data structure in itself are completely standardized. The exercise names are not thought of as so by the author of the data-set, they are standardized; the same is true for equipment names, force, level, etc. (again, excluding description and reputation field). So if one were to remove the description and reputation field, could it still be thought of a copyright infringement. Being that everything else is not subjective or does not require creativity to add?

score 2 · Answer 1 · edited May 21 '22 at 15:39

In the United States, you can't copyright something like a phone book that is just a collection of records sorted in an obvious manner.

According to the US copyright office, you can't copyright a recipe either:

Copyright law does not protect recipes that are mere listings of ingredients. Nor does it protect other mere listings of ingredients such as those found in formulas, compounds, or prescriptions. Copyright protection may, however, extend to substantial literary expression—a description, explanation, or illustration, for example—that accompanies a recipe or formula or to a combination of recipes, as in a cookbook.

This seems like it's something along those lines. On the other hand, it might not be so clear-cut, as much of this "data" is really opinions (like the level and reputation.)

As you say, the "description" field is problematic. But just changing a few words will probably not cut it; that would be a derivative work. You'd probably want to rewrite it from scratch.

score 1 · Answer 2 · answered Oct 22 '16 at 22:19

"Data" is a slippery concept, especially if you're trying to distinguish it from a "program". A simple fact such as the length of a stick or a phone number is not protected by copyright. What you've provided is a structured record, which is closer to being a program, and these days, having the right data structure is most of the work behind making a program. Also, the values are not automatically assigned by an automaton: some element of creativity goes into creating the data structure and the specific records. This is the "modicum of creativity" that is required for copyright protection.

score 1 · Answer 3 · edited Jan 28 '20 at 06:38

The data is probably subject to copyright – it is a literary work stored in tangible form – it is not just a collection of facts.

On the other hand, your proposed use is probably fair use: If you are using it only in house to test your app then it cannot affect their market and this is one of the criteria that you need to show for a fair use defence. Also, how would they ever know?

However, there is an easier solution that avoids the issue altogether: Write a program to create a random but legitimate data set and test on that.

Where to draw the line when it comes to data Copyright law

3 Answers3

Linked