0

I have a software that parses and renders LaTeX formulas. I want to test its accuracy on a large dataset of equations extracted from arxiv papers. Arxiv offers an API to download papers in bulk and so, it is technically feasible to extract the code of all equations from all arxiv papers (or say a very large chunk of it).

Is it legal to make publically available a dataset of LaTeX equations generated in this way, as part of the test suite of my MIT-licensed software? Although it would be majorly inconvenient, I can include attributions to the many many authors in some form. In case the jurisdiction matters, I'm based in France but the papers of course from all over the globe.

What I (think I) learned from browsing the Internet and Legal SE:

  • Arxiv holds a distribution license, not the copyright.
  • Equations are not copyright-able, but their specific presentation may be. That would probably include the LaTeX code for the equation.
  • There is a notion of fair use, although I'm not sure whether it applies here.

Some references:

ChatK
  • 11

2 Answers2

1

The problem you run into is with the use of the API, not copyright infringement

Let’s assume all of the equations are public domain, that is, there are no copyright issues at all. Your use of the API is still subject to the terms and conditions because they are a valid contract - they provide consideration (the service), you provide consideration (a promise to use it only in certain ways).

It doesn’t matter that you could legally copy out all the equations by hand; your use of the API means complying with the API’s terms. Now, I don’t know what those terms are and whether they allow such use - but you need to.

Dale M
  • 237,717
  • 18
  • 273
  • 546
0

France doesn't have a "fair use" exception to copyright.

And in countries with fair use doctrines similar to US's I don't think it would apply. Extracting a small number of equations might fall within the "Amount and substantiality of the portion used" criterion, but extracting all the equations from a collection probably wouldn't.

Barmar
  • 8,504
  • 1
  • 27
  • 57