1

Basic deep reinforcement learning methods use as input an image for the current state, do some convolutions on that image, apply some reinforcement learning algorithm, and it is solved.

Let us take the game Breakout or Pong as an example. What I do not understand is: how does the agent understand when an object is moving towards it or away from it? I believe that the action it chooses must be different in these two scenarios, and, from a single image as input, there is no notion of motion.

nbro
  • 42,615
  • 12
  • 119
  • 217
devidduma
  • 562
  • 3
  • 10

1 Answers1

3

In the article Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013, which was a major outbreak in Deep Reinforcement learning (especially in Deep Q learning), they don't feed only the last image to the network. They stack the 4 last images :

For the experiments in this paper, the function φ from algorithm 1 applies this preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q-function

So they add the motion through sequentiality. From various articles and own coding experiences, this seems to me to be the main common approach. I don't know if other techniques have been implemented.

One thing we could imagine would be to compute the Cross-correlation between a previous frame and the last one, and then feed the cross correlation product to the net.

Another idea would be to train previously a CNN to extract motion features from a sequence of frames, and feed these extracted features to your net. This article (Performing Particle Image Velocimetry using Artificial Neural Networks: a proof-of-concept), Rabault et al, 2017 is an example of a CNN to extract motion features.

16Aghnar
  • 601
  • 3
  • 11