Why does learning rate reduce train-test generalization gap?

Asked Jul 14 '20 at 07:21

Active Jul 14 '20 at 11:59

Viewed 275 times

In this blog post: http://www.argmin.net/2016/04/18/bottoming-out/

Prof Recht shows two plots:

He says one of the reasons the plot below has a lower train-test gap is because that model was trained with a lower learning rate (and he also manually drops the learning rate at 120 epoch).

Why would a lower learning rate reduce overfitting?

edited Jul 14 '20 at 11:59

nbro

42,615
12
119
217

asked Jul 14 '20 at 07:21

user3180

Why does learning rate reduce train-test generalization gap?

0 Answers0