The iterative update rule for policy evaluation that is, approximating the value function for a given policy is:
$$v^{k+1} = r_{\pi} + \gamma P_{\pi}v^{k}$$
This is the simultaneous update rule where the new values of the value function vector are calculated only using the old estimates.
An in-place (Gauss-Seidel type) update would use the new values of the value vector as they are calculated. I know how to do this using for loops but how can we modify the matrix equation above to make this update in a vectorized way in code? That is, how should the vector $r_{\pi}$ and the matrix $P_{\pi}$ be changed to produce the required outcomes?
Asked
Active
Viewed 39 times
1
Atharva
- 11
- 2