2

I am interested in creating a neural network-based engine for chess. It uses a $8 \times 8 \times 73$ output space for each possible move as proposed in the Alpha Zero paper: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

However, when running the network, the first selected move is invalid. How should we deal with this? Basically, I see two options.

  1. Pick the next highest outputted move, until it is a valid move. In this case, the network might automatically over time not put illegal moves on top.
  2. Process the game as a loss for the player who picked the illegal move. This might have the disadvantage that the network might be 'stuck' on only a few legal moves.

What is the preferred solution to this particular problem?

nbro
  • 42,615
  • 12
  • 119
  • 217
whits
  • 21
  • 1

1 Answers1

0

You should have a method to generate a possible moves output based on the board state. Use this as a mask before normalization in the policy head.

nbro
  • 42,615
  • 12
  • 119
  • 217
mshlis
  • 2,399
  • 9
  • 23