Puterman defines an ergodic MDP as
if the transition matrix corresponding to every deterministic stationary policy consists of a single recurrent class.
If the transition matrix is recurrent, it means that there is one aperiodic communicating class of states (the whole states space).
However, this definition seems extremely restricting because of "aperiodic".
Consider a simple chainworld where every state is connected to every state. The whole state space is communicate. I could make a policy that always goes to one state (say X) and stays there forever. For example, the policy would induce the following trajectory: A, B, C, X, X, X, ... Clearly, I am visiting A, B, C once, but only X is recurrent.
Am I interpreting the definition wrong? Should I consider only states visited in the limit? I.e., according to the stationary state distribution?
Am I being correct assuming that the transition matrix is $P = \sum_a \mathcal{P}(s' | s, a) \pi(a|s)$ where $\mathcal{P}$ is the MDP dynamics matrix and $\pi$ is the policy?
I found this related question but I didn't understand the answer.
This question is also related but does not seem to address my concern about "aperiodic".