7

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things.

However, in Section 5.6 of the book, 3rd line, first paragraph, it is written:

Whereas in Chapter 2 we averaged rewards, in Monte Carlo methods we average returns.

What does it mean? Are rewards and returns different things?

nbro
  • 42,615
  • 12
  • 119
  • 217
SJa
  • 393
  • 3
  • 17

2 Answers2

6

Return refers to the total discounted reward, starting from the current timestep.

stoic-santiago
  • 1,201
  • 9
  • 22
3

As the accepted answer states, the return at the current timestep is equal to the sum of discounted rewards from all future timesteps until the end of the episode. In Chapter 5 of Sutton and Barto, returns must be used to estimate the state-value and action-value functions because episode lengths are unrestricted and may be greater than one. In contrast, Chapter 2 deals with the very special case of multi-armed bandits in which episode lengths are always equal to one: The agent begins each episode in a fixed start state, takes an action, receives a reward, and then the episode terminates and the agent begins the next episode at the same start state. Therefore, a return is equivalent to a reward in Chapter 2 because all episodes have length one.

DeepQZero
  • 1,733
  • 1
  • 10
  • 36