I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it, and they used an 8x8x73 array to encode all possible moves. I was wondering how it actually works since I got a bit confused in their explanation:
A move in chess may be described in two parts: selecting the piece to move, and then selecting among the legal moves for that piece. We represent the policy $\pi(a \mid s)$ by a $8 \times 8 \times 73$ stack of planes encoding a probability distribution over 4,672 possible moves. Each of the $8 \times 8$ positions identifies the square from which to "pick up" a piece. The first 56 planes encode possible "queen moves" for any piece: a number of squares $[1..7]$ in which the piece will be moved, along one of eight relative compass directions {N, NE, E, SE, S, SW, W, NW}. The next 8 planes encode possible knight moves for that piece. The final 9 planes encode possible under-promotions for pawn moves or captures in two possible diagonals, to knight, bishop or rook respectively. Other pawn moves or captures from the seventh rank are promoted to a queen.
How would one numerically represent the move 1. e4 or 1. NF3 (and how would the integer for 1. NF3 differ from 1. f3) for example? How do you tell what integer corresponds to which move? This is what I'm essentially asking.
 
     
    