2

I was reading this article about the question "Why do we dream?" in which the author discusses dreams as a form of rehearsal for future threats, and presents it as an evolutive advantage. My question is whether this idea has been explored in the context of RL.

For example, in a competition between AIs on a shooter game, one could design an agent that, besides the behavior it has learned in a "normal" training, seeks for time in which is out of danger, to then use its computation time in the game to produce simulations that would further optimize its behavior. As the agent still needs to be somewhat aware of its environment, it could alternate between processing the environment and this kind of simulation. Note that this "in-game" simulation has an advantage with respect to the "pre-game" simulations used for training; the agent in the game experiences the behavior of the other agents, which could not have been predicted beforehand, and then simulates on top of these experiences, e.g. by slightly modifying them.

For more experienced folks, does this idea make sense? has something similar been explored?

I have absolutely no experience in the field, so I apologize if this question is poorly worded, dumb or obvious. I would appreciate suggestions on how to improve it if this is the case.

nbro
  • 42,615
  • 12
  • 119
  • 217

2 Answers2

2

Yes, the concept of dreaming or imagining has already been explored in reinforcement learning.

For example, have a look at Metacontrol for Adaptive Imagination-Based Optimization (2017) by Jessica B. Hamrick et al., which is a paper that I gave a talk/presentation on 1-2 years ago (though I don't remember well the details anymore).

There is also a blog post about the topic Agents that imagine and plan (2017) by DeepMind, which discusses two more recent papers and also mentions Hamrick's paper.

In 2018, another related and interesting paper was also presented at NIPS, i.e. World Models, by Ha and Schmidhuber.

If you search for "imagination/dreaming in reinforcement learning" on the web, you will find more papers and articles about this interesting topic.

nbro
  • 42,615
  • 12
  • 119
  • 217
0

Model-based RL is obviously the correct approach. Mainly because it lets you simulate the environment internally without having direct interaction.

And all successful RL algorithms essentially are model-based because nobody has done real-time RL and been successful.

nbro
  • 42,615
  • 12
  • 119
  • 217
FourierFlux
  • 847
  • 1
  • 7
  • 17