14

Is there an SPDX or other widely accepted license header syntax/format that prohibits machine learning?

My company's Github license statement includes:

"Licensee is not granted the right to, and Licensee shall not ...[use company's open source code to] ... train Artificial Intelligence (AI) models, including language models, programming models, or other type of model, or any other automated or manual training of deep / multi-layer neural networks ... the intent of this License is for human collaboration and teamwork, and not for any AI or machine training use"

But I have no doubt LLM training crawlers ignore this. I'm looking for a simple, one-line header, in SPDX or other syntax, they would observe -- or at least if they did not, it would be clear to reasonable, impartial observers they should.

I found one other relevant thread, but not asking the same question.

Edit: regarding fair use, for every link saying LLM training may be fair use (e.g. the one in @DaleM's answer), there is another saying it may not. So I would like to clarify my question: hypothetically, setting aside the issue of fair use and assuming that at some point source code licenses must be factored in (as it is for humans), is there a tentative or emerging simple header line that can be added to source code files that effectively says "training prohibited", even if this should be shot down later in the courts ?

Jeff Brower
  • 151
  • 1
  • 5

1 Answers1

22

No

It’s quite likely there never will be.

It is an open question whether copying for the purpose of training an AI model is fair use. If it is, than any prohibition by the copyright holder will be of no legal effect because, if it’s fair use, it isn’t copyright violation.

There are a number of ongoing cases. When they’re decided, we’ll know. For what it’s worth, I think the courts will decide that it is fair use.

Dale M
  • 237,717
  • 18
  • 273
  • 546