1

Summary

In David Silver's RL lecture slides, he defines the State $S_t$ formally as a function of the history:

formal definition of agent History and State

David then goes on to define the Markov state as any state $S_t$ such that the probability of the next timestep is conditionally independent of all other timesteps given $S_t$, including a second detail about how this also implies the Markov chain $H_{1:t} \to S_t \to H_{t:\infty}$:

definition of Markov state

Confusion

I am immediately thrown off by this definition. First of all, the state is defined as $f$, which can be any function. So is the constant function $f(H_t) = 1$ a state? The state $S_t = 1$, $t ∈ \mathbb{R}_+$ must also be a Markov state because it satisfies the definition of a Markov state:

$$\mathbb{P}[S_{t+1} \mid S_t] = \mathbb{P}[S_{t+1} \mid S_1, \dots, S_t].$$

If we worry about $S_t$ not being a probability distribution (though it is), the same thing applies if we instead define $f$ as $f(H_t) \sim \mathcal{N}(0, 1)$, $t ∈ \mathbb{R}_+$.

$S_t = f(H_t) = 1$ cannot possibly imply the Markov chain $H_{1:t} \to S_t \to H_{t:\infty}$, as the history $H$ is quite informative, and a constant function that throws away all relevant information would definitely not make it a sufficient statistic of the history.

I hope somebody can rigorously explain to me what I am missing from this definition. Another thing I notice is that David did not define $H_t$ as a random variable, though the fact that $f(H_t)$ is a random variable would imply otherwise.

nbro
  • 42,615
  • 12
  • 119
  • 217
Andrew
  • 11
  • 2

2 Answers2

2

$f$ cannot be any function, although the slides say "It can be any function of history".

If $f$ was allowed to be any function, it could also be a function that discard the information in all states (like you are thinking), which clearly would not make the current state a Markov state, as per definition.

Moreover, a state can be a number, but it can also be anything more complicated. For example, in a 2d world (e.g. a board game), a state could be the specific configuration of the board, i.e. where the player is, etc. So, in this case, a single number might not be sufficient to represent/define a state.

So, I think $f$ here is any function that compresses all the past states (and actions) into one without losing relevant information.

nbro
  • 42,615
  • 12
  • 119
  • 217
0

Okay, for those of you who have a similar confusion about David's lecture, I suggest you ignore the first lecture in terms of its rigor. It's a bit of an intuition-based lecture, with many of the examples not being fully coherent with definitions given.

In the end, whether $S_t$ as $f(H_t)$ is a sufficient statistic, and the other things stated, don't end up mattering because the lecture material gets deeper in other, more relevant parts of the question. Either way, the class is mainly focused on MDPs with full observability.

Thanks for the comments everyone.

Andrew
  • 11
  • 2