2

In this blog post: http://www.argmin.net/2016/04/18/bottoming-out/

Prof Recht shows two plots:

enter image description here

enter image description here

He says one of the reasons the plot below has a lower train-test gap is because that model was trained with a lower learning rate (and he also manually drops the learning rate at 120 epoch).

Why would a lower learning rate reduce overfitting?

nbro
  • 42,615
  • 12
  • 119
  • 217
user3180
  • 648
  • 5
  • 15

0 Answers0