4

What happens after you have used machine learning to train your model? What happens to the training data?

Let's pretend it predicted correct 99.99999% of the time and you were happy with it and wanted to share it with the world. If you put in 10GB of training data, is the file you share with the world 10GB? If it was all trained on AWS, can people only use your service if they connect to AWS through an API?

What happens to all the old training data? Does the model still need all of it to make new predictions?

nbro
  • 42,615
  • 12
  • 119
  • 217
icYou520
  • 179
  • 1
  • 6

1 Answers1

4

In many cases, a production-ready model has everything it needs to make predictions without retaining training data. For example: a linear model might only need the coefficients, a decision tree just needs rules/splits, and a neural network needs architecture and weights. The training data isn't required as all the information needed to make a prediction is incorporated into the model.

However, some algorithms retain some or all of the training data. A support vector machine stores the points ('support vectors') closest to the separating hyperplane, so that portion of the training data will be stored with the model. Further, k-nearest neighbours must evaluate all points in the dataset every time a prediction is made, and as a result the model incorporates the entire training set.

Having said that, where possible the training data would be retained. If additional data is received, a new model can be trained on the enlarged dataset. If it is decided a different approach is required, or if there are concerns about concept drift, then it's good to have the original data still on hand. In many cases, the training data might comprise personal data or make a company's competitive advantage, so the model and the data should stay separate.

If you'd like to see how this can work, this Keras blog post has some information (note: no training data required to make predictions once a model is re-instantiated).

redhqs
  • 291
  • 1
  • 5