Highest Voted 'tic-tac-toe' Questions - Artificial Intelligence Stack Exchange

10

votes

3 answers

How should I represent the input to a neural network for the games of tic-tac-toe, checkers or chess?

I've been reading a lot about TD-Gammon recently as I'm exploring options for AI in a video game I'm making. The video game is a turn-based positional sort of game, i.e. a "units", or game piece's, position will greatly impact its usefulness in that…

asked Nov 23 '17 at 03:02

NeomerArcana

220
4
13

6

votes

2 answers

Why does self-playing tic-tac-toe not become perfect?

I trained a DQN that learns tic-tac-toe by playing against itself with a reward of -1/0/+1 for a loss/draw/win. Every 500 episodes, I test the progress by letting it play some episodes (also 500) against a random player. As shown in the picture…

reinforcement-learning deep-rl dqn self-play tic-tac-toe

asked Jun 05 '18 at 21:16

user3877351

91
1
6

6

votes

1 answer

What are good learning strategies for Deep Q-Network with opponents?

I am trying to find out what are some good learning strategies for Deep Q-Network with opponents. Let's consider the well-known game Tic-Tac-Toe as an example: How should an opponent be implemented to get good and fast improvements? Is it better to…

reinforcement-learning q-learning dqn game-ai tic-tac-toe

asked Apr 04 '18 at 09:13

murthy10

81
5

4

votes

1 answer

How do we find the length (depth) of the game tic-tac-toe in adversarial search?

When we perform the tic-tac-toe game using adversarial search, I know how to make a tree. Is there a way to find the depth of the tree, and which level is the last level?

game-ai search minimax tic-tac-toe adversarial-search

asked Dec 07 '18 at 07:01

Adil Mustafa

163
1
4

4

votes

1 answer

Why is tic-tac-toe considered a non-deterministic environment?

I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment. But why is that? An action will lead to a known state of the game and an agent has…

game-ai markov-decision-process game-theory pomdp tic-tac-toe

asked Aug 02 '20 at 22:40

EEAH

193
1
5

3

votes

2 answers

How can both agents know the terminal reward in self-play reinforcement learning?

There seems to be a major difference in how the terminal reward is received/handled in self-play RL vs "normal" RL, which confuses me. I implemented TicTacToe the normal way, where a single agent plays against an environment that manages the state…

reinforcement-learning game-ai self-play tic-tac-toe

asked May 30 '18 at 10:50

user3877351

91
1
6

3

votes

3 answers

What is the optimal score for Tic Tac Toe for a reinforcement learning agent against a random opponent?

I guess this problem is encountered by everyone trying to solve Tic Tac Toe with various flavors of reinforcement learning. The answer is not "always win" because the random opponent may sometimes be able to draw the game. So it is slightly less…

reinforcement-learning deep-learning deep-rl tic-tac-toe

asked Sep 04 '21 at 04:44

Yan King Yin

245
1
10

3

votes

1 answer

Why can I still easily beat my Q-learning agent that was trained against another Q-learning agent to play tic tac toe?

I implemented the Q-learning algorithm to play tic-tac-toe. The AI plays against the same algorithm, but they don't share the same Q matrix. After 200,000 games, I still beat the AI very easily and it's rather dumb. My selection is made by epsilon…

reinforcement-learning machine-learning q-learning tic-tac-toe multi-agent-rl

asked Apr 12 '17 at 13:32

Irindul

49
7

3

votes

1 answer

Given these two reward functions, what can we say about the optimal Q-values, in self-play tic-tac-toe?

This corresponds to Exercise 1.1 of Sutton & Barto's book (2nd edition), and a discussion followed from this answer. Consider the following two reward functions Win = +1, Draw = 0, Loss = -1 Win = +1, Draw or Loss = 0 Can we say something about…

reinforcement-learning q-learning sutton-barto self-play tic-tac-toe

asked Jan 21 '19 at 18:59

pg2455

221
1
5

3

votes

1 answer

Why isn't my Q-Learning agent able to play tic-tac-toe?

I tried to build a Q-learning agent which you can play tic tac toe against after training. Unfortunately, the agent performs pretty poorly. He tries to win but does not try to make me 'not winning' which ends up in me beating up the agent no matter…

reinforcement-learning q-learning game-ai combinatorial-games tic-tac-toe

asked Jan 16 '19 at 17:03

SHA256man

33
5

2

votes

1 answer

Where does the TD formula for tic-tac-toe in Sutton & Barto come from?

In section $1.5$ of the book "Reinforcement Learning: An Introduction" by Sutton and Barto they use tic-tac-toe as an example of an RL use case. They provide the following temporal difference update rule in that section: $$ V(S_{t}) \leftarrow…

reinforcement-learning sutton-barto temporal-difference-methods dynamic-programming tic-tac-toe

asked Apr 11 '24 at 02:34

mNugget

167
4

1

vote

0 answers

Why is my Tic Tac Toe agent not closer to 100% draw rate?

I tried to learn neural network programming with Chat GPT's help and viewed related YouTube videos to understand the concepts better. I wanted to train a game playing agent, but decided to start out simple by training a Tic Tac Toe agent using…

deep-neural-networks tic-tac-toe

asked May 24 '25 at 08:28

Shahid Thaika

61
4

1

vote

0 answers

Why does alpha-beta pruning behave like this when applied to tic-tac-toe?

The question in my textbook is as follows: Circle the nodes at depth 2 that would not be evaluated if alpha-beta pruning were applied, assuming the nodes are generated in the optimal order for alpha-beta pruning. My answer to the problem was very…

game-ai alpha-beta-pruning tic-tac-toe

asked Nov 24 '24 at 19:14

HMPtwo

35
6

1

vote

2 answers

tic-tac-toe - tabular q-learning - what is the formula to calculate the number of entries in the q-table

i implemented the tabular q-learning algorithm for 3x3 tictactoe multiple times and everytime the number of entries in the q-table is 16,167. I wanna know how to calculate the number of 16,167. what is the formula to calculate it. the…

reinforcement-learning q-learning tic-tac-toe

asked May 20 '24 at 09:00

Hans123

25
5

1

vote

0 answers

How do I improve my RL tic-tac-toe agent?

I have coded a neural-network-based RL tic-tac-toe agent. It trains well enough to win against random agents almost all the time, the larger the board (the code allows training on NxN boards and with winning line longer than 3), the closer to…

reinforcement-learning deep-rl tic-tac-toe

asked Dec 20 '22 at 13:43

Emil

111
4

Questions tagged [tic-tac-toe]