1

An MDP is ergodic if the Markov chain induced by any policy is ergodic, which means any state is reachable from any other state by following a suitable policy. [Source]

The part after "which means ..." is the definition of communicating MDP. Why does "the Markov chain induced by any policy is ergodic" implies "any state is reachable from any other state by following a suitable policy"?

Other papers/books define an ergodic MDP just with "if the Markov chain induced by any policy is ergodic", but they all say that an ergodic MDP is also communicating. Why ergodic → communicating?

nbro
  • 42,615
  • 12
  • 119
  • 217
Simon
  • 263
  • 1
  • 8

2 Answers2

2

If all states in an irreducible Markov chain (MC) are ergodic, then the chain is said to be ergodic. So once an ergodic MC is induced by a policy, then by above definition all states are irreducible. And irreducibility and communicating here are defined as:

A Markov chain is irreducible if there is one communicating class, the state space.

Two states are said to communicate with each other if both are reachable from one another by a sequence of transitions that have positive probability. This is an equivalence relation which yields a set of communicating classes.

Since ergodicity implies irreducibility which means the existence of one communicating class as the whole state space, therefore any state is reachable from any other state within the said state space by following the said induced policy.

cinch
  • 11,000
  • 3
  • 8
  • 17
2

There are inconsistencies in the definition of an ergodic Markov Chain while de concepts of irreducibility, aperiodicity and recurrence are well grounded.

Thus, I think it is better not to employ this term too much and keep the word "ergodic" for the well-known theorem.

According to the paper you cite,

An MDP is ergodic if the Markov chain induced by any policy π is both irreducible and aperiodic, which means any state is reachable from any other state by following a suitable policy.

So what can we say about that? Let $\mathcal{X}$ be the state space.

If the chain is irreducible, then whether all states are recurrent, or all states are transient. But if $dim(\mathcal{X}) < \infty$, then the second case is impossible.

Thus if the chain is irreducible and $dim(\mathcal{X}) < \infty$, then the chain is recurrent.

Now, If the chain is irreducible and recurrent, there exists a unique (up to a normalisation factor) stationary measure $\mu$.

Also, if the chain is recurrent, irreducible and aperiodic, then there is a $n_0$ such that for any $n>n_0$, $P_n(x, y) > 0$ for any states $x, y \in \mathcal{X}$, where $P_n$ is the $n$ steps transition probability kernel. This means any state $x \in \mathcal{X}$ is reachable from any other state $y \in \mathcal{X}$.

Sources: [1]

[1]: J-F. Le Gall - Intégration, Probabilités et Processus Aléatoires (2006)