In Sutton & Barto (Section 3.6 - Optimal Policies and Optimal Value Functions), they say that :
Value functions define a partial ordering over policies. A policy $\pi$ is defined to be better than or equal to a policy $\pi^{'}$ if its expected return is greater than or equal to that of $\pi^{'}$ for all states. In other words, $\pi \ge \pi^{'}$ if and only if $v_\pi(s) \ge v_\pi^{'}(s)$ for all $s \in \mathbb{S}$.
My question is, why is the better policy defined with respect to all the states values being correspondingly greater instead of a combined metric of all state values of a policy?
If there is a policy that gives me the highest rewards on 99 out of 100 states, but gives me a lower reward on the last state compared to a second policy (which performs poorly on the other 99 states), would this first policy not be considered an optimal policy according to the definition above?