What does self-play in reinforcement learning lead to?

Asked Aug 02 '20 at 13:16

Active Aug 02 '20 at 20:59

Viewed 115 times

Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself, with both sides learning. What do you think would happen in this case? Would it learn a different policy for selecting moves?

Above is an extract from Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton, and I wasn't quite sure about what the answer to the question would be, so thought of posting it here. The algorithm being referred to is the one for playing the game tic-tac-toe.

In my opinion, if the same algorithm plays both sides, it may end up assisting itself to win every time - and not really learn anything. What do you think?

edited Aug 02 '20 at 20:59

nbro

42,615
12
119
217

asked Aug 02 '20 at 13:16

stoic-santiago

1,201
9
22

What does self-play in reinforcement learning lead to?

0 Answers0