I'm quoting Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014:
Definition 2.1 (The Realizability Assumption). There exists $h^{\star} \in \mathcal{H}$ s.t. $L(D, f )(h^{\star}) = 0$. Note that this assumption implies that with probability 1 over random samples, S, where the instances of S are sampled according to D and are labeled by f, we have $L_{S}(h^{\star})=0$.
My understanding of the second sentence in this definition is that because $h^{\star}$ satisfies the equation $L(D, f )(h^{\star}) = 0$, so every prediction made by $h^{\star}$ on every example $x$ sampled from the domain set $\mathcal{X}$ is correct (otherwise the loss $L(D, f )(h^{\star})$ will not equal 0). Equivalently, every prediction made by $h^{\star}$ is correct. Therefore, for any sample $S$ sampled from $\mathcal{X}$ we have $L_{S}(h^{\star})=0$.
However what I'm stumble upon is when the author further collaborates on this def.:
The realizability assumption implies that for every ERM hypothesis we have that $L_{S}(h_{S})=0$.
I don't quite get what the author means here since every ERM hypothesis $h_{S}$ is found based on some subjective minimization algorithm, which in turn, depends on a number of other factors, such as the choice of the loss function, the sample size, the algorithm complexity and thus may not always converge to $h^{\star}$?