3

I guess this problem is encountered by everyone trying to solve Tic Tac Toe with various flavors of reinforcement learning.

The answer is not "always win" because the random opponent may sometimes be able to draw the game. So it is slightly less than the always-win score.

I wrote a little Python program to calculate that. Please help verify its correctness and inform me if it has bugs or errors.

Yan King Yin
  • 245
  • 1
  • 10

3 Answers3

3

This blog post suggests that when playing against a random opponent, if the agent goes first, the win rate is 97.8%, and if they go second, the win rate is 79.6% (and the rest are draws).

Taw
  • 1,321
  • 5
  • 12
1

Here I asssume:

  • Both players avoid illegal moves perfectly
  • Player X always chooses the move with maximum expectation value
  • Player O chooses all available moves with equal probability

Result depends on the scoring scheme:

  • (This scheme is used in one version of Gym Tic Tac Toe)

    For win=20, draw=10, lose=-20:

    Optimal expectation value =

    • X plays first: 19.94791666666666...

    • O plays first: 19.164021164021...

  • For win=20, draw=0, lose=-20:

    Optimal expectation value =

    • X plays first: 19.89583333333333...

    • O plays first: 18.497354497354....

It also helps to verify the program with some pre-played board positions, included in the code.

Here is the program:

import math     # for math.inf = infinity

print("Calculate optimal expectation value of TicTacToe") print("from the perspective of 'X' = first player.") print("Assume both players perfectly avoid illegal moves.") print("Player 'X' always chooses the move with maximum expectation value.") print("Player 'O' always plays all available moves with equal probability.") print("You may modify the initial board position in the code.")

Empty board

test_board = 9 * [0]

Pre-moves, if any are desired:

X|O|

O|O|X

X| |

#test_board[0] = -1 #test_board[3] = 1 #test_board[6] = -1 #test_board[4] = 1 #test_board[5] = -1 #test_board[1] = 1

def show_board(board): for i in [0, 3, 6]: for j in range(3): x = board[i + j] if x == -1: c = '❌' elif x == 1: c = '⭕' else: c = ' ' print(c, end='') print(end='\n')

if test_board != 9 * [0]: print("\nInitial board position:") show_board(test_board)

**** Calculate expectation value of input board position

def expectation(board, player):

if player == -1:
    # **** Find all possible next moves for player 'X'
    moves = possible_moves(board)

    # Calculate expectation of these moves;
    # Player 'X' will only choose the one of maximum value.
    max_v = - math.inf
    for m in moves:
        new_board = board.copy()
        new_board[m] = -1       # Player 'X'

        # If this an ending move?
        r = game_over(new_board, -1)
        if r is not None:
            if r > max_v:
                max_v = r
        else:
            v = expectation(new_board, 1)
            if v > max_v:
                max_v = v
    # show_board(board)
    print("X's turn.  Expectation w.r.t. Player X =", max_v, end='\r')
    return max_v

elif player == 1:
    # **** Find all possible next moves for player 'O'
    moves = possible_moves(board)
    # These moves have equal probability
    # print(board, moves)
    p = 1.0 / len(moves)

    # Calculate expectation of these moves;
    # Player 'O' chooses one of them randomly.
    Rx = 0.0        # reward from the perspective of 'X'
    for m in moves:
        new_board = board.copy()
        new_board[m] = 1        # Player 'O'

        # If this an ending move?
        r = game_over(new_board, 1)
        if r is not None:
            if r == 10:             # draw is +10 for either player
                Rx += r * p
            else:
                Rx += - r * p       # sign of reward is reversed
        else:
            v = expectation(new_board, -1)
            Rx += v * p
    # show_board(board)
    print("O's turn.  Expectation w.r.t. Player X =", Rx, end='\r')
    return Rx

def possible_moves(board): moves = [] for i in range(9): if board[i] == 0: moves.append(i) return moves

Check only for the given player.

Return reward w.r.t. the specific player.

def game_over(board, player): # check horizontal for i in [0, 3, 6]: # for each row if board[i + 0] == player and
board[i + 1] == player and
board[i + 2] == player: return 20

# check vertical
for j in [0, 1, 2]:     # for each column
    if board[3 * 0 + j] == player and \
       board[3 * 1 + j] == player and \
       board[3 * 2 + j] == player:
        return 20

# check diagonal
if board[0 + 0] == player and \
   board[3 * 1 + 1] == player and \
   board[3 * 2 + 2] == player:
    return 20

# check backward diagonal
if board[3 * 0 + 2] == player and \
   board[3 * 1 + 1] == player and \
   board[3 * 2 + 0] == player:
    return 20

# return None if game still open
for i in [0, 3, 6]:
    for j in [0, 1, 2]:
        if board[i + j] == 0:
            return None

# For one version of gym TicTacToe, draw = 10 regardless of player;
# Another way is to assign draw = 0.
return 10

print("\u001b[2K\nOptimal value =", expectation(test_board, -1) )

Example output (for X's turn to play):

enter image description here

Yan King Yin
  • 245
  • 1
  • 10
0

Actually, you can directly calculate the optimal odds to win in TicTacToe against a random opponent and it turns out that you would win 191 out of 192 games and draw the last one when beginning.

For that -unintuitively- you have to start in a corner, lets say top left:

O . .
. . .
. . .

Then one can see that the only way to draw for the second player is to put the X in the middle (so chance for a random opponent: 1/8). If he doesn't, O can force a win, e.g. after

O X . 
. . .
. . .

O can put the marker in the middle, forcing X on the bottom right and then putting an O on the middle left, creating 2 possibilities for a tic-tac-toe.

One can easily also check the other options for X to go wrong and will confirm that only an X in the middle will hold a draw, but to keep the answer brief I will not proof it here.

Then, we can put our next "O" in the top middle, leading to

O O .
. X .
. . .

where the second player needs to put an "X" in the top right (chance 1/6 for a random opponent)

Then we place our "O" in the bottom left to simultaneously deny "X"'s thread and make a thread on our own:

O O X
. X .
O . .

So at this point a random opponent has a 1/4 chance to save the game. After that, we have to play middle right to draw, as we have no more chance to get any Tic-Tac-Toe even with cooperative play from X.

The chance to win is therefore 1 - 1/8*1/6*1/4 = 191/192.

Remark: One can easily see that this is the best strategy for a win. The total number of moves for "X" are 384 (8 on turn one, then 6, then 4, then 2), out of which with this strategy only 2 ways draw. When "O" starts in the middle, with perfect play all 4 corners can draw for "X". Analogously one can justify every subsequent decision for "O".

Ritteraxt
  • 1
  • 1