Questions tagged [reward-hypothesis]

For questions related to the reward hypothesis (RH), i.e. the hypothesis that goals (in reinforcement learning) can be defined as the maximization of the expected value of the cumulative sum of the received reward, which is a scalar signal. According to Barto and Sutton's book, the RH was suggested by Michael Littman.

2 questions
12
votes
4 answers

Counterexamples to the reward hypothesis

On Sutton and Barto's RL book, the reward hypothesis is stated as that all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called…
4
votes
2 answers

Can rewards be decomposed into components?

I'm training a robot to walk to a specific $(x, y)$ point using TD3, and, for simplicity, I have something like reward = distance_x + distance_y + standing_up_straight, and then it adds this reward to the replay buffer. However, I think that it…