Using the definition of APAC learning and uniform convergence in practice

Question

I am currently studying "Understanding Machine Learning from Theory to Practice" written by Shai Shalev-Shwartz and Shai Ben-David. I want to understand how i can use the Definitions and Results of the Theory he describes in Practice.

Consider the problem of fitting a one dimensional polynomial to data; namely, our goal is to learn a function, $h : R → R$ , and as prior knowledge we consider the hypothesis class of polynomials up to degree 10. Our class $H$ has VCdim($H$)=11 so with the fundamental theorem of statistical learning it is APAC learnable with ERM. If i fix my $\epsilon,\delta \in (0,1)$, then if my sample size is equal to $ C_2\frac{11+\log(1/\delta)}{\epsilon^2}$ i can be sure that with probability of atleast $1-\delta$, ERM will output a hypothesis with \begin{equation} L_D(h_s)\leq \min_{h}L_D(h)+\epsilon. \end{equation} $C_2$ is a constant and $h_S$ is the hypotheis of the ERM algorithm. Now in practical Terms, this doesnt tell me anything about the quality of my Model. Because we dont know the underlying Distribution $D$ we can't compute $L_D(h)$ for any $h\in H$. But i can compute $L_S(h)$ for every $h\in H$ in particular $L_S(h_S)$. My intuition is then that we could use the uniform-convergence property of our class $H$ (VCdim(H) is finite) to get a bound for how much $L_D(h_S)-L_S(h_S)$ differ. With probability of atleast $1-\delta$ \begin{equation} L_D(h_S)\leq \min_{h}L_D(h)+\epsilon \leq \min_{h}{L_S(h)+\epsilon}+\epsilon=L_S(h_S)+2\epsilon \end{equation} I know my the value of $\epsilon$ so i can calculate the above expression. For example with $\epsilon=0.01$ and $L_S(h_S)=0.01$ i can gurantee that the true error of my hypotheis $h_S$ is at most $\leq 0.03$ with probability $1-\delta$.

score 0 · Answer 1 · answered Mar 15 '24 at 06:21

Indeed by lemma 4.2, once the training set $S$ is $(\epsilon/2)$-representative the ERM learning rule is guaranteed to be APAC learnable $L_D(h_s)\leq \min_{h}L_D(h)+\epsilon$. However, it's not clear based on what theorem you arrived at $\min_{h}L_D(h) \leq \min_{h}L_S(h)+\epsilon$. Since the $h$ on both sides may not be the same hypothesis in the hypothesis class under min operator, uniform convergence property doesn't ensure this step at all.

Using the definition of APAC learning and uniform convergence in practice

1 Answers1