I have been learning about agnostic PAC and uniform convergence with finite hypothesis classes. If we use a NN with $d$ parameters with $64$-bit precision
$$ \frac{r^2(128d+2\ln(\frac{2}{\delta}))}{\epsilon^2} $$
should be an upper bound on the number of examples needed for the sample error to be within epsilon of the generalization error for any hypothesis, where $r$ is the "range" of the loss function.
The theory suggests overfitting shouldn't occur, but it obviously does, so what am I missing?