What is the effect of training a neural network with randomly generated fake data that satisfies certain constraints?

Question

I have a neural network with 2 inputs and one output, like so:

input    | output
____________________
 a    | b   |  c       
 5.15 |3.17 | 0.0607
 4.61 |2.91 | 0.1551

etc.

I have 75 samples and I am using 50 for training and 25 for testing.

However, I feel that the training samples are not enough. Because I can't provide more real samples (due to time limitation), I would like to train the network using fake data:

For example, I know that the range for the a parameter is from 3 to 14, and that the b parameter is ~65% of the a parameter. I also know that c is a number between 0 and 1 and that it increases when a & b increase.

So, what I would like to do is to generate some data using the above restrictions (about 20 samples). For example, assume a = 13 , b = 8 and c= 0.95, and train the network with these samples before training it with the real samples.

Has anybody studied the effect of doing this on the neural network? Is it possible to know if the effect will be better or worse on the networks? Are there any recommendations/guidelines if I want to do this?

score 2 · Accepted Answer · edited Dec 25 '21 at 17:43

This is not advisable. If you train your model with random data your model is not learning anything useful, because there is no information to gain from those examples. Even worse it may (and likely is) trying to generalize off of your incorrect examples, which will lessen the effect your real examples have. Essentially, you are just dampening your training set with noise.

You are moving in the right direction though. 75 examples will not be enough if your problem has any complexity at all. And unless you know some correlation between the inputs a, b and the output c, you don't want to generate data (and even if you did know some correlation, it is not always suggested to generate data). If it is impossible to get any more data, you might want to consider a statistical model, rather than a neural network.

score 1 · Answer 2 · edited May 21 '24 at 17:33

1

If you add fake samples to the training set, your neural network learns the new dataset that you just made. Your fake samples are estimations, so you add noise to your training set.

You can use leave one out cross validation for evaluating your model.

edited May 21 '24 at 17:33

Michael Mior

105
4

answered Apr 01 '18 at 18:42

CVDE

195
9

What is the effect of training a neural network with randomly generated fake data that satisfies certain constraints?

2 Answers2