I am confused about the theoretical framework of reinforcement learning. For supervised learning, there seems to be a clear theoretical framework, e.g. as described by Wikipedia here. I am unclear about a similar framework for RL.
It seems that MDPs are baked into typical introduction to RL courses and theoretical descriptions of RL in the literature. However, certain problems that are also considered under RL don't seem to fit this: the multi-arm bandit problem, for example (or more practically, maybe RLHF), is often considered as a RL problem but doesn't fit the MDP model well (although I guess they can be modeled as a MDP with just 1 timestep); some RL problems, like 2-player games, don't seem to be able to be modeled under the typical RL MDP setup at all.
Is there a theoretical RL framework that encapsulates everything?