6

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad \text{for all } s\in S .$$

However, I could not find a proof of this result anywhere. Given that this statement is fundamental for dynamic programming (I think), I am interested in a rigorous proof. (I hope that I am not missing anything trivial here)

nbro
  • 42,615
  • 12
  • 119
  • 217
MMM
  • 185
  • 3

1 Answers1

0

I think the statement above is incorrect, it should be $\forall \; \pi \in \Pi $ and not $\forall \; s \in S $. Then the proof is trivial and we can show this by contradiction. Assume $V_\pi(s) < V_\hat\pi(s),\; \forall s \in S$ is always the case, i.e. $\nexists \pi \;\forall \;\hat\pi, s.t. V_\pi(s) \geq V_\hat\pi(s)$. This statement is wrong beacause for $\pi = \hat\pi$ the equality condition holds. For the $>$ case, we can use the definition of $\pi^*$, optimal policy.

kaiser
  • 101
  • 1