3

Excercise 3.5 The equastions in Section 3.1 are for the continuing case and need to be modified (very slightly) to apply to episodic tasks. Show that you know the modifications needed by giving the modified version of (3.3).

$\displaystyle\sum_{s^{\prime} \in S} \displaystyle\sum_{r \in R} = p(s^{\prime}, r | s,a) = 1$ , for all $s\in S, a \in A(s)$ (3.3)

Is it just about final states? So for $s \in S$ when S is not final?

Jakub Bielan
  • 165
  • 4

2 Answers2

2

Is it just about final states? So for $s \in S$ when S is not final?

You are thinking the right way, but to represent what you mean you don't need to write out "when $s$ is not final" - although that would be fine (and is used in some places), there is a more concise way of saying that given to you by the book.

As this is a formal exercise from the book, I don't want to write out an answer that could be cut&paste for all students.

Instead I suggest you take a look at the notations section at the beginning of the book, and find how Sutton & Barto use different set labels for all states including terminal states, and all states excluding terminal states. Also, check carefully which of those sets needs to be summed over.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
0

I found myself turning in cycles for a while, so to clarify Neil Slater's answer,

In the beginning of the book, $S$ means "set of non-terminal states" and $S^+$ means "set of all states, including the terminal ones".

$$\sum_{s^{\prime} \in S} \sum_{r \in R} p(s^{\prime}, r | s,a) = 1, \forall s \in S, a \in A(s) \tag{3.3}$$

That said, in eq. 3.3 when we define that $\forall s \in S$, we say that that once in a terminal state, the formula does not apply (which is obvious because no action is ever available in a terminal state by definition).

It does not however constraint the probability in how to "get" in a terminal state, and that is the key to answer the question.

Philip Raeisghasem
  • 2,074
  • 12
  • 30
Gigi
  • 111
  • 3