How does deep learning overcome overfitting?

Question

From Berkeley CS182, SP22: https://cs182sp22.github.io/assets/lecture_slides/2022.01.26-ml-review-pt2.pdf.

Can someone help me interpret this diagram? I understand the graph on the left, but I don't understand how in the right graph, the test risk starts going back down. I'm unfamiliar with the "interpolating regime", so maybe that would explain some things.

score 3 · Answer 1 · answered May 15 '23 at 17:48

The right plot is about Deep Double Descent, a phenomenon observed in Deep Learning that challenges the classical belief (left plot) in statistical learning theory.

The first half of the right plot, depicts the classical empirical risk minimization situation in which you seek for the optimal model capacity: the one that balances bias and variance, achieving a low training error and good generalization.
It has been observed that if you keep increasing the capacity of some model (assuming proper regularization) your training error goes to zero (the interpolation threshold) and, unexpectedly, the generalization (test) error decreases too as the capacity is increased!
This modern interpolation regime were over-parameterized models actually generalize much better than models with the just-right capacity contradicts the classical view and the bias-variance trade-off.

One explanation for this is that, for DL models variance is not only in the sampling of the data used for training (like assumed by classical ML), but also in the weight initialization, optimization, and training procedure as well. Seems that over-parameterization decreases such variance allowing for better generalization.

How does deep learning overcome overfitting?

1 Answers1