Questions tagged [tic-tac-toe]

For questions about the tic-tac-toe (aka noughts and crosses) in the context of artificial intelligence.

20 questions
10
votes
3 answers

How should I represent the input to a neural network for the games of tic-tac-toe, checkers or chess?

I've been reading a lot about TD-Gammon recently as I'm exploring options for AI in a video game I'm making. The video game is a turn-based positional sort of game, i.e. a "units", or game piece's, position will greatly impact its usefulness in that…
6
votes
2 answers

Why does self-playing tic-tac-toe not become perfect?

I trained a DQN that learns tic-tac-toe by playing against itself with a reward of -1/0/+1 for a loss/draw/win. Every 500 episodes, I test the progress by letting it play some episodes (also 500) against a random player. As shown in the picture…
6
votes
1 answer

What are good learning strategies for Deep Q-Network with opponents?

I am trying to find out what are some good learning strategies for Deep Q-Network with opponents. Let's consider the well-known game Tic-Tac-Toe as an example: How should an opponent be implemented to get good and fast improvements? Is it better to…
4
votes
1 answer

How do we find the length (depth) of the game tic-tac-toe in adversarial search?

When we perform the tic-tac-toe game using adversarial search, I know how to make a tree. Is there a way to find the depth of the tree, and which level is the last level?
4
votes
1 answer

Why is tic-tac-toe considered a non-deterministic environment?

I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment. But why is that? An action will lead to a known state of the game and an agent has…
3
votes
2 answers

How can both agents know the terminal reward in self-play reinforcement learning?

There seems to be a major difference in how the terminal reward is received/handled in self-play RL vs "normal" RL, which confuses me. I implemented TicTacToe the normal way, where a single agent plays against an environment that manages the state…
3
votes
3 answers

What is the optimal score for Tic Tac Toe for a reinforcement learning agent against a random opponent?

I guess this problem is encountered by everyone trying to solve Tic Tac Toe with various flavors of reinforcement learning. The answer is not "always win" because the random opponent may sometimes be able to draw the game. So it is slightly less…
3
votes
1 answer

Why can I still easily beat my Q-learning agent that was trained against another Q-learning agent to play tic tac toe?

I implemented the Q-learning algorithm to play tic-tac-toe. The AI plays against the same algorithm, but they don't share the same Q matrix. After 200,000 games, I still beat the AI very easily and it's rather dumb. My selection is made by epsilon…
3
votes
1 answer

Given these two reward functions, what can we say about the optimal Q-values, in self-play tic-tac-toe?

This corresponds to Exercise 1.1 of Sutton & Barto's book (2nd edition), and a discussion followed from this answer. Consider the following two reward functions Win = +1, Draw = 0, Loss = -1 Win = +1, Draw or Loss = 0 Can we say something about…
3
votes
1 answer

Why isn't my Q-Learning agent able to play tic-tac-toe?

I tried to build a Q-learning agent which you can play tic tac toe against after training. Unfortunately, the agent performs pretty poorly. He tries to win but does not try to make me 'not winning' which ends up in me beating up the agent no matter…
2
votes
1 answer

Where does the TD formula for tic-tac-toe in Sutton & Barto come from?

In section $1.5$ of the book "Reinforcement Learning: An Introduction" by Sutton and Barto they use tic-tac-toe as an example of an RL use case. They provide the following temporal difference update rule in that section: $$ V(S_{t}) \leftarrow…
1
vote
0 answers

Why is my Tic Tac Toe agent not closer to 100% draw rate?

I tried to learn neural network programming with Chat GPT's help and viewed related YouTube videos to understand the concepts better. I wanted to train a game playing agent, but decided to start out simple by training a Tic Tac Toe agent using…
1
vote
0 answers

Why does alpha-beta pruning behave like this when applied to tic-tac-toe?

The question in my textbook is as follows: Circle the nodes at depth 2 that would not be evaluated if alpha-beta pruning were applied, assuming the nodes are generated in the optimal order for alpha-beta pruning. My answer to the problem was very…
HMPtwo
  • 35
  • 6
1
vote
2 answers

tic-tac-toe - tabular q-learning - what is the formula to calculate the number of entries in the q-table

i implemented the tabular q-learning algorithm for 3x3 tictactoe multiple times and everytime the number of entries in the q-table is 16,167. I wanna know how to calculate the number of 16,167. what is the formula to calculate it. the…
Hans123
  • 25
  • 5
1
vote
0 answers

How do I improve my RL tic-tac-toe agent?

I have coded a neural-network-based RL tic-tac-toe agent. It trains well enough to win against random agents almost all the time, the larger the board (the code allows training on NxN boards and with winning line longer than 3), the closer to…
Emil
  • 111
  • 4
1
2