1

I am currently running a program with a batch size of 17 instead of batch size 32. The benchmark results are obtained at a batch size of 32 with the number of epochs 700.

Now I am running with batch size 17 with unchanged number epochs. So I am interested to know whether there is any relationship between the batch size and the number of epochs in general.

Do I need to increase the number of epochs? Or is it entirely dependent on the program?

nbro
  • 42,615
  • 12
  • 119
  • 217
hanugm
  • 4,102
  • 3
  • 29
  • 63

2 Answers2

2

The smaller the batch_size is, the larger number of batches is processed per one epoch.

On one hand, since one makes more steps per epoch, one can think, that less epochs are required to achieve the same level of accuracy.

On the other side, smaller batch size leads to more noisy and stochastic estimates of the gradient, therefore, convergence would not be as steady most likely.

I think it is difficult to give a definite answer about the exact relation on the number of epochs - since say, to achieve a certain level of accuracy use of small batch may be more beneficial since it allows for more exploration and is more likely to escape from local minima and saddles, but when one reaches the approximation limit of the network and is in the vicinity of the good optimum - large batch would descend better to the extremum.

Good strategy usually is to start from smaller batches to find wide and plain minima, which are better from the generalization point of view, and then increase batch size for steadier convergence.

0

Smaller batch size means the model is updated more often. So, it takes longer to complete each epoch. Also, if the batch size is too small, each update is done without "seeing" all the data - the batch itself might not be a good representative of the dataset. So, there might be too much "wiggling", which makes it harder to get real minimum. Larger batches will get you "near" minimum for quickly, as they have larger and step sizes.

As you only asked the relation between batch size and epochs, above is the answer. But in practice, one hardly uses very large batches as it doesn't get very close to the minimum for the same reason they reach there faster - once you are there, you want to have smaller steps to get very close to the minimum. Smaller batches might take longer but once they are there, their wiggly nature becomes a strength, and gets them closer to the minimum.

For a more in depth discussion and some references, see this post.

And batch sizes are usually picked to be a power of 2, so I would go for 16 instead of 17 if I were you. Check this discussion for the reason for this.

serali
  • 900
  • 7
  • 17