6

SGD is able to jump out of local minima that would otherwise trap BGD

I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic Gradient Descent) is able to escape local minima, while BGD (Batch Gradient Descent) can't?

P.S.

While searching online, I read that it has something to do with "oscillations" while taking steps towards the global minima. What's that?

nbro
  • 42,615
  • 12
  • 119
  • 217
stoic-santiago
  • 1,201
  • 9
  • 22

1 Answers1

0

SGD has greater mobility precisely due to its stochastic nature. Because the gradient can drastically change (or oscillate) due to random training items being picked, it might produce a gradient large enough to jump out the local minimum.

Take a look on the purple line on the illustration below (source):

Look how chaotic it is compared to the non stochastic ones. It is harder to keep a naughty child in place than a well-behaved one.

talles
  • 146
  • 3