AlphaZero Chess: High portion of "draws" during first rounds of self-play hampers learning

Question

I am experimenting with a self-programmed version of AZ Chess following the described methodology in the official paper. I am experiencing that at the beginning of the self-play (when the ANN weights are is still randomly initialized) a very high portion of all games end with a "draw" (ca. 97% of all games). This is not surprising because if two pure random players play against each other, it is very unlikely that white or black achieve a win/lose. As a result, ca. 97% of all board states visited during the sell-play are trained towards a target value of 0 (instead towards +1/-1 for white to win/lose). This means that there is very litte training data (3% of all training data generated via self-play) that helps the ANN to learn how to win and to get stronger.

Does anyone know how DeepMind has overcome this problem? I have found nothing about this in the www.

score 1 · Accepted Answer · answered Jan 23 '24 at 21:17

First intuition

I wrote a quick script to test the outcome distribution of random chess games: I get about 7% wins for black and white each, so 14% non-drawn games. The MCTS should still be able to find simple mate in one moves even with a fully uninitialized network, so I would expect it to draw even less.

Your 97% draw rate is significantly higher than that, so I suspect something else is going wrong. Some ideas to try:

Do your own random moves test to confirm that you also get 7% win for each player, otherwise something went wrong with your board implementation.
Run MCTS with a random NN on a mate-in-one position, and ensure that it finds the mate and picks it as the best (highest visit count) move after a couple thousand visits.

I think in theory 3% decisive games is already enough to kickstart the learning process, but specifically for chess it probably means something in the implementation is wrong.

Actual data

Here's the win/draw/loss plot from a chess run of my own AlphaZero implementation, kZero. The horizontal axis corresponds to about 300k selfplay games played. This is with 400 visits/move, and settings otherwise mostly similar to the AlphaZero paper.

It looks like the behavior is as follows:

The initial games generated with a random NN have only 17% draws, so MCTS is finding a lot of mates
Very quickly the drawrate goes up to 35%, as the NN learns to avoid mate in one
After that the drawrate drops back down to 8%, as the NN learns to finish games from winning positions.
For the rest of the run the drawrate slowly climbs again, as is typical for stronger chess players and engines.

AlphaZero Chess: High portion of "draws" during first rounds of self-play hampers learning

1 Answers1

First intuition

Actual data