Proof of uniqueness of value function for MDPs with undiscounted rewards

Question

How does one prove the uniqueness of the value function obtained from value iteration in the case of bounded and undiscounted rewards? I know that this can be proven for the discounted case pretty easily using the Banach fixed point theorem.

score 1 · Answer 1 · answered May 14 '18 at 22:40

In the case where the reward is undiscounted, there is no guarantee of convergence as the iteration procedure is not a strict contraction.

Unfortunately I can't find the math mode on the ai stackexchange so my answer can't be very precise.

But an easy example is the following: Take a 'running' reward R of 0 to make things simpler, and a MDP with two states a and b. Take a transition matrix with 0's on the diagonal and 1 off diagonal. You will see that the algorithm will always flip the values of V(a) and V(b), and hence no convergence.

Proof of uniqueness of value function for MDPs with undiscounted rewards

1 Answers1