Why does my model overfit on pseudo-random numbers training data?

Question

I am trying to predict pseudo-random numbers using the past numbers with a multiplayer perceptron. The error while training is very low. However, as soon as I test it with a test set, the model overfits and returns very bad results. The correlation coefficient and error metrics are both not performing well.

What would be some of the ways to solve this issue?

For example, if I train it with 5000 rows of data and test it with 1000, I get:

Correlation coefficient                  0.0742
Mean absolute error                      0.742 
Root mean squared error                  0.9407
Relative absolute error                146.2462 %
Root relative squared error            160.1116 %
Total Number of Instances             1000

As mentioned, I can train it with as many training samples as I want and still have the model overfits. If anyone is interested, I can provide/generate some data and post it online.

score 1 · Accepted Answer · edited Jan 06 '20 at 19:26

Simply said, predicting pseudo random number is just not possible for now. Pseudo random numbers generated now have a high enough "randomness" so that it cannot be predicted. Pseudo random numbers is the basis of modern cryptography which is widely used in the world wide web and more. It may be possible in the future through faster computers and stronger AI, but for now it is not possible. If you train a model to fit on pseudo random numbers, the model will just overfit and thus creating a scenario as shown in the question. The training loss will be very low while test loss will be extremely high. The model will just "remember" the training data instead of generalising to all pseudo random numbers, thus the high test loss.

Also, as a side note, loss is not represented by %, instead it is just a raw numeric value.

See this stack exchange answer for details.

Why does my model overfit on pseudo-random numbers training data?

1 Answers1