When training a DNN on infinite samples, do ADAM or other popular optimization algorithms still work as intended?

Question

I have an DNN training from an infinite stream of samples, that most likely won't repeat. So there is no real notion of "epoch".

Now I wonder if the math behind ADAM or other popular optimizers expect the repetition of data over the epochs?

If so, should I collect a limited amount of those samples and use them for training data and validation data, or would it be better to use all data available (even if the training data never repeats then)?

score 1 · Answer 1 · answered Jan 20 '23 at 14:27

In general, the methods still work even with an infinite amount of data, as long as there are common/reoccuring patterns that a neural network can learn to identify. For example: If you would have infinitely many images of dogs and cats, there are features that discriminate the two animals that are mostly consistent like the shape of the nose. Having infinitely many samples in these cases is generally desirable, because it can benefit the ability of the model to generalize.

In contrast, there are cases where this is not true that depend on the data: If your data contains (concept-) drift, meaning that your data distribution changes over time, the model you train might not be able to learn a consistently performing function and therefore chases a moving objective. For images, this can be the case if the labels would depend on lighting conditions that constantly change (concept drift), or if the objects in the images that you want to classify continuously change shape (drift).

score 0 · Answer 2 · answered Jan 20 '23 at 14:22

Subdividing datasets into batches is only done for computational reasons (RAM, computation time etc). Optimizers do not care about repeated data. They get a batch of data to optimize on and will calculate the gradient for that batch and update the model accordingly. Nothing in this process requires the batches to reappear at all.

When training a DNN on infinite samples, do ADAM or other popular optimization algorithms still work as intended?

2 Answers2