8

Suppose there is some class of mathematical problems and a paid software that solves this class of mathematic problem. Now, let's say someone uses this paid software to generate training sets for a neural network to learn on, and then eventually trains the neuronal net to effectively reverse engineer what the paid software does.

Would it be a crime then to publish this software?

Trish
  • 50,532
  • 3
  • 101
  • 209
Clemens Bartholdy
  • 1,469
  • 16
  • 32

5 Answers5

24

Copyright law is concerned with making actual copies of the software and making changes to it (like de-compiling and then giving meaningful names to functions).

Copyright law is not concerned with observing what outputs the software gives for various inputs and deducting from that how the software must work. Creating new software based on that deduction is also not a copyright violation, as you didn't make an actual copy.

What might prevent you from publishing the new software are

  • any patents that protect what the original software is doing. You can violate a patent even if you independently invent the same thing.
  • the contract you entered into when obtaining the original software might contain a clause that prohibits figuring out how the software works. What you did to create your own software would then be a breach of that contract.
7

Reverse engineering is not illegal

Unless the process is protected by a patent.

Dale M
  • 237,717
  • 18
  • 273
  • 546
5

I would argue that you're not reverse-engineering the software. You are just monitoring its behaviour on some data and learning from it.

Imagine the expensive software you bought is just returning y=|x|⋅x. Let's assume that your "business" involves monitoring the time duration of some processes and therefore your x is a positive number in some range. You collect a bunch of observations, then consider the pairs (xᵢ,yᵢ) and train a model on these data (it could be a neural network or any other model). Assuming you have enough data and that your model is "complex" enough, your fitted model might fit your training data perfectly. However that doesn't mean that you have reverse engineered ("identified") the original device. Your model could infer the rule y=x² or y=max(x,0)² or pretty much any other function that fits your data. In other words, unlike in traditional reverse-engineering, your "replica" of the system cannot learn to mimic the original system outside the data that you trained it on.

Let's consider another example where we now assume your data spans the whole range of admissible inputs. Imagine your device is computing and returning the standard deviation of its nine inputs. Your fitted model might be able to learn the correct mapping between the inputs and the output but it wouldn't generally know if your device computes y=sqrt(Σᵢ(xᵢ-(Σᵢxᵢ)/9)²/9) or y=sqrt((Σᵢxᵢ²)/9 -((Σᵢxᵢ)/9)²). Conversely, reverse engineering would identify exactly the procedure followed, not just the behaviour as observed from the outside. This is not a subtle distinction because many times in practice "how something is computed" is more important than "what is computed" because different methods of computing the same quantity might have different speed (and different ways the speed scales with the size of the input), different accuracies (and different ways the accuracy degrades for particular inputs), etcetera. See for example the many different ways to solve a linear system Ax=b (LU, QR, Jacobi, ...). It is not unlikely that the value (and trade secrets) of the software you are trying to mimic is exactly in the ways something is computed rather than in what is computed.

Having established that learning from behaviour is distinct from reverse-engineering, the question remains as to whether it is permissible to publish software that mimics the behaviour of another system. I think this will depend on various factors, including the terms of any licenses, laws, or agreements that govern the use of the original software. In general, if your software is developed through a process of learning from behaviour, without accessing or using any proprietary information, trade secrets, or copyrighted materials, I would think that it should not infringe on the intellectual property rights of the original software. However, IANAL ;-)

Luca Citi
  • 161
  • 2
2

I don't think that would be reverse engineering, exactly. Either way, I don't think reverse engineering itself can get you into trouble since you would be working with legally available information (even if it's obscured or difficult to process such as machine code, if you have access rights it's still legally available). You could get into trouble for doing something illegal with the knowledge you've gained, such as circumventing security measures or creating a competing product of a pattented design, but then it's those additional actions that would get you intro trouble. Reverse engineering on it's own is a process of deduction and honest investigation. I don't think that using any specific technology (as long as you have proper rights, access, etc) changes that.

Here's a caveat, though. If you were to use a machine learning model then most, if not all, of the process would be obscured from you. Depending on how it's trained, it could potentially interact with/execute code in a way that would be considered illegal. A neural network is a form of machine learning and it's practically a black box, but it's more of a mathematical structure than a programmatic one. It computes a bunch of matrices to get output values from input values and the numerical values are completely relative. A machine learning model could include a neural net and use the outputs for code execution, but as I understand it, a neural net itself doesn't generally 'do' anything besides algebra. I know you didn't mention machine learning, I just wanted to make it clear that there is an important destinction between that and neural nets specifically. It still depends on the specific technology being used, but this is the most general answer I can give.

Keep in mind that this is all just my personal opinion/knowledge and based more on reason than law, which are very separate things. What I've read seems to support what I've said. However, if this is a practical question, best to consult with a legal expert :)

nano
  • 21
  • 3
2

let's say someone uses this paid software to generate training sets for a neural network to learn on

Generally, data generated by software is not covered by copyright, as it is not the result of a creative process. An exception is if the output data includes e.g. images or texts that came with the software.

You may still be covered by terms and conditions of the application. Often T&C prohibit reverse engineering and can (try to) place many kinds of conditions on the use of the software. How well such terms would hold up in court is often hard to determine beforehand. It wouldn't be against law directly, instead the manufacturer would have to raise a case for breach of contract.

and then eventually trains the neuronal net to effectively reverse engineer what the paid software does.

Often reverse engineering is performed by looking at the innards of the paid application. But the dictionary definition and most terms and conditions include any method of figuring out how it works.

For example, Adobe's terms and conditions explicitly forbid the approach suggested in the question:

you must not .... reverse engineer (including but not limited to monitoring or tracking the inputs and outputs flowing through a system or an application in order to recreate that system)

Such terms would be limited by applicable law. For example EU directive 2009/24 has:

The person having a right to use a copy of a computer program shall be entitled, without the authorisation of the rightholder, to observe, study or test the functioning of the program in order to determine the ideas and principles which underlie any element of the program if he does so while performing any of the acts of loading, displaying, running, transmitting or storing the program which he is entitled to do.

and

any contractual provisions contrary to the provisions of this Directive laid down in respect of decompilation or to the exceptions provided for by this Directive with regard to the making of a back-up copy or to observation, study or testing of the functioning of a program should be null and void.

jpa
  • 588
  • 2
  • 6