2

I'm not having a lot of intuition about the equation. I have this Bellman update rule:

$$v_{\pi}(s) =\sum_a \pi(a|s)\sum_{s',r} p(s',r|s,a)[r+ \gamma v_{k}(s')]$$

But where are the parenthesis? Is the second sum using the index $a$ from the first sum? Or is it independent, and can I move out the $[r+ \gamma v_{k}(s')]$ term out of the sum?

nbro
  • 42,615
  • 12
  • 119
  • 217
nammerkage
  • 216
  • 1
  • 8

1 Answers1

3

Here's your equation with an additional couple of parenthesis that emphasizes the order of the operations (note that you had a small typo in your original equation).

$$v_{\pi}(s) =\sum_a \pi(a \mid s) \left(\sum_{s',r} p(s',r \mid s,a)[r+ \gamma v_\pi(s')] \right)$$

Now, let me answer your other questions.

Is the second sum using the index $a$ from the first sum?

Yes.

Or is it independent, and can I move out the $[r+ \gamma v_\pi(s')]$ term out of the sum?

No, and you cannot move this term out of the sum because the second sum is a sum over $r$ and $s'$ and $r+ \gamma v_\pi(s')$ depends on those terms.

Note that $v_{\pi}(s)$ is defined as an expectation and that $\pi(a \mid s)$ (the policy) and $p(s',r \mid s,a)$ (the model) are probability distributions.

nbro
  • 42,615
  • 12
  • 119
  • 217