4

I want to build model-based RL. I am wondering about the process of building the model.

If I already have data, from real experience:

  • $S_1, a \rightarrow R,S_2$
  • $S_2, a \rightarrow R,S_3$

Can I use this information, to build model-based RL? Or it is necessary that the agent directly interact with the environment (I mean the same above-mentioned data should be provided by the agent)?

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
user46045
  • 43
  • 2

1 Answers1

1

If you already have some transition tuples then you can train a model to predict environment dynamics using these. However, you should be careful that your pre-gathered data is diverse enough to 'cover' enough of the state/action space so that your model remains accurate. For instance, when you start training your agent it will likely start to see more of the state space than it did at the start of training (imagine playing Atari, initially your agent will die quickly but as it gets better episodes will get longer) so you would need to make sure you have data for these states that appear late in episodes, otherwise your model will just be overfitting to the start of the episode and will give a poor performance on these other states, thus slowing down or even prohibiting learning of an optimal policy.

David
  • 5,100
  • 1
  • 11
  • 33