0

Reading about learning rates I had the idea that Cyclic LRs could be interesting.

It's likely one could justify either way:

  • that they'd kick you off the minimum plus that saddle points are solved with Momentum based optimiser and make Cyclic LRs not needed.

  • that its sporadic abrupt changes or 'jumps' in the learning rate could help when a model has gotten stuck for a while.

I'm aware that there is a 2015 paper about it and some implementations available in PyTorch, and abandoned in Tensorflow.

My question is:

  • Are there studies showing any problems of these approach, or validating it more broadly? Or equivalently: Are there known caveats for specific architectures?
  • Are there large models that have shown substantial improvements with it?

0 Answers0