Why is my Tic Tac Toe agent not closer to 100% draw rate?

Asked May 24 '25 at 08:28

Active May 27 '25 at 14:50

Viewed 77 times

I tried to learn neural network programming with Chat GPT's help and viewed related YouTube videos to understand the concepts better.

I wanted to train a game playing agent, but decided to start out simple by training a Tic Tac Toe agent using Policy + Value Net. I have uploaded my code here.

I modified the training code to start from all starting positions, play as player 1 or 2, and feel the reward is reasonable for 1 = win, 0 = draw, -1 = loss. Yet, as the agent trains more, it looses more as opposed to getting closer to 100% draw rate. What can I do? Would either setting loss reward to -1.5 or draw to 0.25 help? It still looses in actual game play though.

I realise for Tic Tac Toe, you can use MiniMax or even Q-Network given the small number of states, but I decided to go with Policy + Value Net as my eventual games will be more complex and wanted to learn the complex solution, by implementing it for a simple problem.

edited May 27 '25 at 14:50

asked May 24 '25 at 08:28

Shahid Thaika

Why is my Tic Tac Toe agent not closer to 100% draw rate?

0 Answers0