How does SGD escape local minima?

Question

SGD is able to jump out of local minima that would otherwise trap BGD

I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic Gradient Descent) is able to escape local minima, while BGD (Batch Gradient Descent) can't?

P.S.

While searching online, I read that it has something to do with "oscillations" while taking steps towards the global minima. What's that?

score 0 · Answer 1 · answered Sep 16 '24 at 23:47

SGD has greater mobility precisely due to its stochastic nature. Because the gradient can drastically change (or oscillate) due to random training items being picked, it might produce a gradient large enough to jump out the local minimum.

Take a look on the purple line on the illustration below (source):

Look how chaotic it is compared to the non stochastic ones. It is harder to keep a naughty child in place than a well-behaved one.

How does SGD escape local minima?

1 Answers1