The following is from page 17 of "Michael Hu, “The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python”, Apress, 2023"
https://link.springer.com/book/10.1007/978-1-4842-9606-6
An example of good reward engineering is in the game of Atari Breakout, where the goal of the agent is to clear all the bricks at the top of the screen by bouncing a ball off a paddle. One way to design a reward function for this game is to give the agent a positive reward for each brick it clears and a negative reward for each time the ball passes the paddle and goes out of bounds. However, this reward function alone may not lead to optimal behavior, as the agent may learn to exploit a loophole by simply bouncing the ball back and forth on the same side of the screen without actually clearing any bricks.
This part is not clear:
However, this reward function alone may not lead to optimal behavior, as the agent may learn to exploit a loophole by simply bouncing the ball back and forth on the same side of the screen without actually clearing any bricks.
Why should the agent bounce the ball back and forth on the same side of the screen? Is there a reward in this case?