1

If the Markov chain induced by the policy is ergodic, then the state distribution under the policy is stationary.

Are there some conditions/properties (sufficient or necessary, on the MDP or on the policy) to check if a policy induces an ergodic Markov chain? (Beside stationary transition and stationary policy). That is, can we say if the Markov chain induced by the policy is stationary by looking at the resulting transition matrix $P_{ss'}^\pi = \sum_a P(s'|s,a)\pi(a|s)$? For example, maybe if the eigenvector of this matrix are all positive, or some other condition like that.

Simon
  • 263
  • 1
  • 8

1 Answers1

1

Besides stationary policy whose decision at each state doesn't depend on time in a MDP context, the ergodic criterion for the policy-induced Markov chain (MC) is irreducibility of the said chain and aperiodicity and positive recurrence of all states.

If all states in an irreducible Markov chain are ergodic, then the chain is said to be ergodic.

A Markov chain is irreducible if there is one communicating class, the state space.

A state $i$ is said to be ergodic if it is aperiodic and positive recurrent. In other words, a state $i$ is ergodic if it is recurrent, has a period of $1$, and has finite mean recurrence time.

Once these 3 conditions are met, the induced MC must have a stationary distribution due to its ergodicity. A prime example is ideal gas in statistical mechanics following the famous Maxwell-Boltzmann distribution.


Edit per OP's addition and clarification:

If you only want a stationary distribution, subtly you may not need a ergodic induced MC as clearly illustrated with this simple example in the above same reference.

For some stochastic matrices P, the limit ${\textstyle \lim _{k\to \infty }\mathbf {P} ^{k}}$ does not exist while the stationary distribution does, as shown by this example: $${\displaystyle \mathbf {P} ={\begin{pmatrix}0&1\\1&0\end{pmatrix}}\qquad \mathbf {P} ^{2k}=I\qquad \mathbf {P} ^{2k+1}=\mathbf {P} }$$ $${\displaystyle {\begin{pmatrix}{\frac {1}{2}}&{\frac {1}{2}}\end{pmatrix}}{\begin{pmatrix}0&1\\1&0\end{pmatrix}}={\begin{pmatrix}{\frac {1}{2}}&{\frac {1}{2}}\end{pmatrix}}}$$ (This example illustrates a periodic Markov chain.)

Since the environment transition matrix or kernel $P$ is always a stochastic matrix, the (not necessarily unique) existence of a stationary distribution is always guaranteed if every entry $P_{i,j}$ is positive ensuring irreducibility. Of course if the $P$ is further aperiodic with mean positive recurrence to become ergodic, such existence is unique per Perron-Frobenius theorem.

Intuitively, a stochastic matrix represents a Markov chain; the application of the stochastic matrix to a probability distribution redistributes the probability mass of the original distribution while preserving its total mass. If this process is applied repeatedly, the distribution converges to a stationary distribution for the Markov chain.

As for policies, those ensure sufficient exploration (e.g., ϵ-greedy or stochastic policies) promote irreducibility by enabling transitions to all parts of the state space. If a policy ignores certain states such as deterministic policies DPG without appropriate noise added (e.g., Ornstein–Uhlenbeck or Gaussian noise), it can create disconnected components in the chain, violating irreducibility and failing to ensure positive recurrence.

cinch
  • 11,000
  • 3
  • 8
  • 17