1

Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself, with both sides learning. What do you think would happen in this case? Would it learn a different policy for selecting moves?

Above is an extract from Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton, and I wasn't quite sure about what the answer to the question would be, so thought of posting it here. The algorithm being referred to is the one for playing the game tic-tac-toe.

In my opinion, if the same algorithm plays both sides, it may end up assisting itself to win every time - and not really learn anything. What do you think?

nbro
  • 42,615
  • 12
  • 119
  • 217
stoic-santiago
  • 1,201
  • 9
  • 22

0 Answers0