2

If an Atari game's rewards can be between $-100$ and $100$, when can we say an agent learned to play this game? Should it get the reward very close to $100$ for each instance of the game? Or it is fine if it gets a low score (say $-100$) at some instances? In other words, if we plot the agent's score versus number of episodes, how should the plot look like? From this plot, when can we say the agent is not stable for this task?

nbro
  • 42,615
  • 12
  • 119
  • 217
user491626
  • 241
  • 1
  • 5

0 Answers0