I tried to learn neural network programming with Chat GPT's help and viewed related YouTube videos to understand the concepts better.
I wanted to train a game playing agent, but decided to start out simple by training a Tic Tac Toe agent using Policy + Value Net. I have uploaded my code here.
I modified the training code to start from all starting positions, play as player 1 or 2, and feel the reward is reasonable for 1 = win, 0 = draw, -1 = loss. Yet, as the agent trains more, it looses more as opposed to getting closer to 100% draw rate. What can I do? Would either setting loss reward to -1.5 or draw to 0.25 help? It still looses in actual game play though.
I realise for Tic Tac Toe, you can use MiniMax or even Q-Network given the small number of states, but I decided to go with Policy + Value Net as my eventual games will be more complex and wanted to learn the complex solution, by implementing it for a simple problem.