How to modify the bellman operator for in-place iterative policy evaluation?

Asked Mar 13 '24 at 11:17

Active Mar 13 '24 at 11:18

Viewed 39 times

The iterative update rule for policy evaluation that is, approximating the value function for a given policy is: $$v^{k+1} = r_{\pi} + \gamma P_{\pi}v^{k}$$ This is the simultaneous update rule where the new values of the value function vector are calculated only using the old estimates.
An in-place (Gauss-Seidel type) update would use the new values of the value vector as they are calculated. I know how to do this using for loops but how can we modify the matrix equation above to make this update in a vectorized way in code? That is, how should the vector $r_{\pi}$ and the matrix $P_{\pi}$ be changed to produce the required outcomes?

edited Mar 13 '24 at 11:18

asked Mar 13 '24 at 11:17

Atharva

How to modify the bellman operator for in-place iterative policy evaluation?

0 Answers0