5

How does one prove the uniqueness of the value function obtained from value iteration in the case of bounded and undiscounted rewards? I know that this can be proven for the discounted case pretty easily using the Banach fixed point theorem.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66

1 Answers1

1

In the case where the reward is undiscounted, there is no guarantee of convergence as the iteration procedure is not a strict contraction.

Unfortunately I can't find the math mode on the ai stackexchange so my answer can't be very precise.

But an easy example is the following: Take a 'running' reward R of 0 to make things simpler, and a MDP with two states a and b. Take a transition matrix with 0's on the diagonal and 1 off diagonal. You will see that the algorithm will always flip the values of V(a) and V(b), and hence no convergence.

me47
  • 111
  • 1