I am trying to understand the concept of model-free and model-based approaches. As far as I understand, having a model of the environment does not mean that an RL agent has to be model-based. It is about the policy. However, if we can model the environment, why should we want to employ a model-free algorithm? Isn't it better to have a model and expectation about the next reward and state? If you have a better understanding of all these, can you explain them to me as well?
1 Answers
However, if we can model the environment, why should we want to employ a model-free algorithm?
Depends what you mean by "model the environment". There are two kinds of model:
Distribution model, which provides full access to a function like $p(r,s'|s,a)$, the probability of observing reward $r$ and next state $s'$ given starting state $s$ and taking action $a$.
Sample model, which can process a single step within the environment on demand. That should provide access to a function like $\text{step}(s,a)$ that returns a single $(r, s')$ pair based on running the environment forward from $(s,a)$.
Which kind of model is available makes a difference to which kinds of model-based reinforcement-learning approaches will work. For instance a distribution model is required for dynamic programming, but MCTS may only need a sample model.
A distribution model can be converted easily to a sample model, but the other way around is only possible approximately (by taking lots of samples).
Isn't it better to have a model and expectation about the next reward and state?
All things being equal, then yes having access to and using a model allows for better estimates, and opens up possibilities for various types of planning algorithm.
However, even if you can build a model of an environment, there are costs:
Writing a distribution model can be hard theoretically - it is often far easier to simulate an environment for a single step forward than calculate probability distributions for all outcomes of the same step.
The need for random access to arbitrary states can be expensive. A simulator that only needs to run forward from a single reset to a start state may run much more efficiently than one that needs to be set to multiple places in a tree of possibilities arbitrarily.
These costs are different for different environments. Setting a game like chess or go to arbitrary states is relatively cheap compared to a modern computer game or something that simulates a real-world physical system with lots of detail. Partially-observable states can be a major problem in both cases, since any random access model would need to account for all possible "true" states that lie behind whatever the agent is processing.
So basically, at different levels of environment complexity, either the development cost of making a full model, or the evaluation cost of setting arbitrary states can make it more effective to use a model-free approach.
Agents that learn to build their own approximate models of the environment are of great interest, as these potentially address the issue by getting the best of both world. Or possibly they could be tunable to use model-free vs model-based approaches in the most efficient manner depending on the relative costs and accuracy of each approach.
- 33,739
- 3
- 47
- 66