Highest Voted 'self-play' Questions - Artificial Intelligence Stack Exchange

6

votes

2 answers

Why does self-playing tic-tac-toe not become perfect?

I trained a DQN that learns tic-tac-toe by playing against itself with a reward of -1/0/+1 for a loss/draw/win. Every 500 episodes, I test the progress by letting it play some episodes (also 500) against a random player. As shown in the picture…

asked Jun 05 '18 at 21:16

user3877351

91
1
6

6

votes

0 answers

How to correctly implement self-play with DQN?

I have an environment where an agent faces an equal opponent, and while I've achieved OK performance implementing DQN and treating the opponent as a part of the environment, I think performance would improve if the agent trains against itself…

deep-learning reinforcement-learning dqn deep-rl self-play

asked Mar 17 '20 at 12:49

Pell000

61
1

6

votes

0 answers

How exactly does self-play work, and how does it relate to MCTS?

I am working towards using RL to create an AI for a two-player, hidden-information, a turn-based board game. I have just finished David Silver's RL course and Denny Britz's coding exercises, and so am relatively familiar with MC control, SARSA,…

reinforcement-learning definitions monte-carlo-tree-search pomdp self-play

asked Feb 09 '20 at 13:05

Alienator

61
2

4

votes

1 answer

Are multi agent or self-play environments always automatically POMDPs?

As part of my thesis, I'm working on a zero sum game with RL to train an agent. The game is a real-time game, a derivation of pong, one could imagine playing pong with both sides being foosball rods. As I see it, this is an MDP with perfect…

reinforcement-learning markov-decision-process pomdp self-play observation-spaces

asked Jan 16 '22 at 16:45

kitaird

119
6

4

votes

1 answer

In AlphaZero, which policy is saved in the dataset, and how is the move chosen?

I've been doing some research on the principles behind AlphaZero. Especially this cheat sheet (1) and this implementation (2) (in Connect 4) were very useful. Yet, I still have two important questions: How is the policy network updated? In (2),…

reinforcement-learning alphazero self-play

asked Jan 10 '20 at 23:04

Jonas De Schouwer

143
4

4

votes

0 answers

How to deal with nonstationary rewards in asymmetric self-play reinforcement learning?

Suppose we're training two agents to play an asymmetric game from scratch using self play (like Zerg vs. Protoss in Starcraft). During training one of the agents can become stronger (discover a good broad strategy for example) and start winning most…

reinforcement-learning deep-rl self-play

asked Dec 24 '19 at 19:26

Dzugaru

141
1

3

votes

2 answers

How can both agents know the terminal reward in self-play reinforcement learning?

There seems to be a major difference in how the terminal reward is received/handled in self-play RL vs "normal" RL, which confuses me. I implemented TicTacToe the normal way, where a single agent plays against an environment that manages the state…

reinforcement-learning game-ai self-play tic-tac-toe

asked May 30 '18 at 10:50

user3877351

91
1
6

3

votes

2 answers

How to fight with unstability in self play?

I'm working on a neural network that plays some board games like reversi or tic-tac-toe (zero-sum games, two players). I'm trying to have one network topology for all the games - I specifically don't want to set any limit for the number of available…

reinforcement-learning convolutional-neural-networks deep-rl epsilon-greedy-policy self-play

asked Apr 10 '21 at 22:13

Maras

141
6

3

votes

1 answer

How does MuZero learn to play well for both sides of a two-player game?

I'm coding my own version of MuZero. However, I don't understand how it supposed to learn to play well for both players in a two-player game. Take Go for example. If I use a single MCTS to generate an entire game (to be used in the training stage),…

reinforcement-learning self-play muzero

asked Oct 31 '20 at 17:47

Ziofil

128
7

3

votes

1 answer

Given these two reward functions, what can we say about the optimal Q-values, in self-play tic-tac-toe?

This corresponds to Exercise 1.1 of Sutton & Barto's book (2nd edition), and a discussion followed from this answer. Consider the following two reward functions Win = +1, Draw = 0, Loss = -1 Win = +1, Draw or Loss = 0 Can we say something about…

reinforcement-learning q-learning sutton-barto self-play tic-tac-toe

asked Jan 21 '19 at 18:59

pg2455

221
1
5

2

votes

2 answers

Generalising performance of Q-learning agent through self-play in a two-player game (MCTS?)

I'm using Q-learning (off-policy TD-control as specified in Sutton's book on pg 131) to train an agent to play connect four. My goal is to create a strong player (superhuman performance?) purely by self-play, without training models against other…

reinforcement-learning q-learning self-play

asked Jun 14 '20 at 15:29

Toekan

23
2

1

vote

1 answer

How can I oppose two AI agents with keras / tensoflow?

I am trying to use tensorflow / keras to play a text based game. The game opposes two players that play by answering questions by choosing an answer among the proposed ones. Game resembles this: Questions asked from player 1, choose value {0, 1,…

tensorflow keras self-play

asked Dec 14 '18 at 16:30

Matthieu Raynaud de Fitte

121
5

1

vote

1 answer

Is neural fictitious self play violating off-policy theorem

I was reading the NFSP player from D. Silver, and I'm somewhat confused by the algorithm: In particular, given that we sample an action according to best response ($\sigma = \epsilon-\text{greedy}(Q)$), we also insert this transition in…

reinforcement-learning game-theory self-play

asked Jul 19 '23 at 12:33

Alberto

2,863
5
12

1

vote

1 answer

Reproducing AlphaZero/MuZero: Failed to beat initial model in arena

I am trying to reproduce AlphaZero's algorithm on the board game Carcassonne. Since I want to use the final game score differences (i.e. victory point of player 1 - victory point of player 2) as the final and only reward, AlphaZero's UCB score can…

reinforcement-learning monte-carlo-tree-search alphazero self-play muzero

asked Apr 07 '23 at 18:55

TommyX

13
3

1

vote

0 answers

What does self-play in reinforcement learning lead to?

Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself, with both sides learning. What do you think would happen in this case? Would it learn a different policy for…

reinforcement-learning self-play

asked Aug 02 '20 at 13:16

stoic-santiago

1,201
9
22

Questions tagged [self-play]