1

Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite number of elements. We could then say the following

$$p(s',r | s,a) = P\{S_t=s',R_t=r | S_{t-1}=s,A_t=a\} ~~~ \forall s',s \in \mathcal{S}, r \in \mathcal{R} \subset \mathbb{R}, a \in \mathcal{A}$$

where the function $p$ defines the dynamics of the finite MDP and $P$ defines the probability.


How could I extend this to a general MDP? That is, an MDP where the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ haven't a finite number of elements? To be more precise, in my case $\mathcal{A} \subset \mathbb{R}^n$, $\mathcal{S} \subset \mathbb{R}^m$, and $\mathcal{R} \subset \mathbb{R}$. My thought is that the equation above is still true, however, the probability is zero for each tuple $s',r,s,a$.

Is it sufficient to say that for finite MDP we have

$$\sum_{s'\in\mathcal{S}}\sum_{r\in\mathcal{R}}p(s',r|s,a)=1 ~~~ \forall s\in\mathcal{S},a\in\mathcal{A}$$

while in non-finite MDP (supposing that the sets $\mathcal{s}$ and $\mathcal{A}$ are continuous) we have

$$\int_{s'\in\mathcal{S}}\int_{r\in\mathcal{R}}p(s',r|s,a)=1 ~~~ \forall s\in\mathcal{S},a\in\mathcal{A}$$

or is it more complex than this?

nbro
  • 42,615
  • 12
  • 119
  • 217
gvgramazio
  • 706
  • 2
  • 8
  • 20

0 Answers0