8

I just read about deep Q-learning, which is using a neural network for the value function instead of a table.

I saw the example here: Using Keras and Deep Q-Network to Play FlappyBird and he used a CNN to get the Q-value.

My confusion is on the last layer of his neural net. Neurons in the output layer each represent an action (flap, or not flap). I also see the other projects where the output layer also represents all available actions (move-left, stop, etc.)

How would you represent all the available actions of a chess game? Every pawn has a unique and available movement. We also need to choose how far it will move (rook can move more than one square). I've read Giraffe chess engine's paper and can't find how he represents the output layer (I'll read once again).

I hope somebody here can give a nice explanation about how to design NN architecture in Q-learning, I'm new in reinforcement learning.

Shayan Shafiq
  • 350
  • 1
  • 4
  • 12
malioboro
  • 2,859
  • 3
  • 23
  • 47

3 Answers3

8

To model chess as a Markov decision problem (MDP) you can refer to the AlphaZero paper (Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm). The exact details can be found starting from the bottom of page 13.

Briefly, an action is described by picking a piece and then picking a move with it. The size of the board is 8 by 8 so there can be 8x8 possibilities for picking a piece. Then we can either pick linear movements (in 8 directions) and then pick the number of steps in that direction (maximum 7 steps) or we can make a knight movement (maximum 8 possibilities). So far that is 8x7 + 8. Furthermore, we also need to consider underpromotions (promoting a pawn into a non-queen piece). In this scenario we can have 3 types of pawn movements (forward, left diagonal or right diagonal capture) and 3 types of promotions (rook, knight, bishop) so that makes it 9. So the total dimension of the action space is 8x8x(8x7+8+9) and this will be the number of neuron outputs you will need to use.

Note that this action space representation covers every possible scenario and for example at the start of the game the action of picking the tile E4 and promoting it to a bishop doesn't make sense (there are no pieces on tile E4 at the beginning of the game). Or if we pick a tile where there is a rook we cannot make a knight movement with it. Therefore you will also need to implement a function that can return you the set of possible actions in a given state and ignore all neural network outputs that is not contained in this set.

Obviously this action representation is not set into stone so if you can come up with something better or more compact you can use that one too. You can also make restrictions to your game for example by not allowing underpromotions.

Hai Nguyen
  • 572
  • 5
  • 14
3

The accepted answer severely overcounts the actual action space because of the assumption that any piece can move a maximum of 7 squares in any direction from any square on the board. The calculation 8x8x(8x7+8+9) = 4672 is even much more than the most naive estimate of 64x63 = 4032.

The actual action space to specify any possible legal move is 1924. The difference is not negligible as this number is less than half of 4672. So, if using a dense final layer, this would save about 60% of the work for that layer.

Here is the Python script using chess I wrote to confirm it:

import chess

Each chess piece moves like either a queen or a knight from one square to another square.

All possible moves except pawn promotions can be specified by the "from" square and the "to" square

(even castling, ex. e1->g1, or e1->c1).

action_space = 0 b = chess.BaseBoard.empty() for square in range(64): # Place queen and see where it attacks b.set_piece_at(square, chess.Piece.from_symbol('Q')) q_moves = b.attacks(square)

# Place knight and see where it attacks
b.set_piece_at(square, chess.Piece.from_symbol('N'))
n_moves = b.attacks(square)

# Logical or to combine bitmaps (ex. 1100 | 0101 = 1101)
all_moves = q_moves | n_moves

# Convert bitmap to list of bools, so the sum 
# is exactly the # possible moves from this square
action_space += sum(all_moves.tolist())

b.remove_piece_at(square)

Count underpromotions manually:

8 forward promotions, 7 right-capture promotions, 7 left-capture promotions

which can all promote to 4 pieces (but 1 is already counted above) for 2 colors

action_space += (8+7+7) * (4-1) * 2

print('actual action space:', action_space) print('naive action space:', 64 * 63) print('accepted answer's action space:', 88(8*7 + 8 + 9))

actual action space: 1924

naive action space: 4032

accepted answer's action space: 4672

Note: This assumes that the same output neuron will be used to represent for example the moves Ra7a8 and a8=Q, as well as the moves Re1g1 and O-O (by white). If this is not desirable, the action space that counts queen promotions and castling moves as distinct actions has size 1972. You can see this by changing (4-1) to 4 in the final step that accounts for underpromotions, then adding 4 for the 2 castling directions by both colors.

Jackson H
  • 31
  • 5
0

Having written the code below, I find 1968 unique moves. This thread is old but maybe this helps someone.

import chess

Create an empty chess benter preformatted text hereoard

board = chess.Board()

Initialize an empty list to store all possible moves

action_space = []

Iterate over all squares and all piece types

for square in chess.SQUARES: for piece_type in chess.PIECE_TYPES: for color in chess.COLORS: # Set player to color board.turn = color # Place piece on given square board.set_piece_at(square, chess.Piece(piece_type, color))

        # If the piece is a pawn, place other pawns around it to include attacks
        if piece_type == chess.PAWN:
            other_color = chess.WHITE if color == chess.BLACK else chess.BLACK
            for square_offset in [-7, -9, 7, 9]:
                if square + square_offset in chess.SQUARES:
                    board.set_piece_at(square + square_offset, chess.Piece(piece_type, other_color))

        for move in board.legal_moves:
            # If the move is not already in the action space, add it
            if move not in action_space:
                action_space.append(move)

        # Clear the board
        board.clear()

action_space = sorted(action_space, key=lambda x: x.uci())

for i in range(len(action_space) - 1): if action_space[i] == action_space[i+1]: print("Duplicate found: ", action_space[i].uci()) exit() print("No duplicates found") print("Number of moves: ", len(action_space))