3

I am experimenting with a self-programmed version of AZ Chess following the described methodology in the official paper. I am experiencing that at the beginning of the self-play (when the ANN weights are is still randomly initialized) a very high portion of all games end with a "draw" (ca. 97% of all games). This is not surprising because if two pure random players play against each other, it is very unlikely that white or black achieve a win/lose. As a result, ca. 97% of all board states visited during the sell-play are trained towards a target value of 0 (instead towards +1/-1 for white to win/lose). This means that there is very litte training data (3% of all training data generated via self-play) that helps the ANN to learn how to win and to get stronger.

Does anyone know how DeepMind has overcome this problem? I have found nothing about this in the www.

1 Answers1

1

First intuition

I wrote a quick script to test the outcome distribution of random chess games: I get about 7% wins for black and white each, so 14% non-drawn games. The MCTS should still be able to find simple mate in one moves even with a fully uninitialized network, so I would expect it to draw even less.

Your 97% draw rate is significantly higher than that, so I suspect something else is going wrong. Some ideas to try:

  • Do your own random moves test to confirm that you also get 7% win for each player, otherwise something went wrong with your board implementation.
  • Run MCTS with a random NN on a mate-in-one position, and ensure that it finds the mate and picks it as the best (highest visit count) move after a couple thousand visits.

I think in theory 3% decisive games is already enough to kickstart the learning process, but specifically for chess it probably means something in the implementation is wrong.

Actual data

Here's the win/draw/loss plot from a chess run of my own AlphaZero implementation, kZero. The horizontal axis corresponds to about 300k selfplay games played. This is with 400 visits/move, and settings otherwise mostly similar to the AlphaZero paper.

It looks like the behavior is as follows:

  • The initial games generated with a random NN have only 17% draws, so MCTS is finding a lot of mates
  • Very quickly the drawrate goes up to 35%, as the NN learns to avoid mate in one
  • After that the drawrate drops back down to 8%, as the NN learns to finish games from winning positions.
  • For the rest of the run the drawrate slowly climbs again, as is typical for stronger chess players and engines.

WDL plot

KarelPeeters
  • 493
  • 2
  • 8