Questions tagged [alphazero]

For questions related to DeepMind's AlphaZero, which is a computer program that can play Go, Chess, and Shogi. AlphaZero achieved, within 24 hours of training, a superhuman level of play in these three games by defeating world-champion programs Stockfish, Elmo, and the 3-day version of AlphaGo Zero. AlphaZero was introduced in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" (2017) by David Silver et al.

Have a look at the research paper that introduced AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (2017) by David Silver et al. and https://en.wikipedia.org/wiki/AlphaZero.

77 questions
15
votes
1 answer

Why does the policy network in AlphaZero work?

In AlphaZero, the policy network (or head of the network) maps game states to a distribution of the likelihood of taking each action. This distribution covers all possible actions from that state. How is such a network possible? The possible actions…
15
votes
3 answers

Does Monte Carlo tree search qualify as machine learning?

To the best of my understanding, the Monte Carlo tree search (MCTS) algorithm is an alternative to minimax for searching a tree of nodes. It works by choosing a move (generally, the one with the highest chance of being the best), and then performing…
11
votes
3 answers

Why were Chess experts surprised by the AlphaZero's victory against Stockfish?

It was recently brought to my attention that Chess experts took the outcome of this now famous match as something of an upset. See: Chess’s New Best Player Is A Fearless, Swashbuckling Algorithm As as a non-expert on Chess and Chess AI, my…
DukeZhou
  • 6,209
  • 5
  • 27
  • 54
10
votes
1 answer

Is AlphaZero an example of an AGI?

From DeepMind's research paper on arxiv.org: In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, to the games of chess and shogi as well as Go, without any additional domain knowledge except the rules of the…
Siddhartha
  • 413
  • 2
  • 11
9
votes
1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…
8
votes
2 answers

How does AlphaZero's MCTS work when starting from the root node?

From the AlphaGo Zero paper, during MCTS, statistics for each new node are initialized as such: ${N(s_L, a) = 0, W (s_L, a) = 0, Q(s_L, a) = 0, P (s_L, a) = p_a}$. The PUCT algorithm for selecting the best child node is $a_t = argmax(Q(s,a) +…
sb3
  • 167
  • 1
  • 7
8
votes
2 answers

How can alpha zero learn if the tree search stops and restarts before finishing a game?

I am trying to understand how alpha zero works, but there is one point that I have problems understanding, even after reading several different explanations. As I understand it (see for example…
7
votes
3 answers

Would AlphaGo Zero become perfect with enough training time?

Would AlphaGo Zero become theoretically perfect with enough training time? If not, what would be the limiting factor? (By perfect, I mean it always wins the game if possible, even against another perfect opponent.)
7
votes
0 answers

How is the rollout from the MCTS implemented in both of the AlphaGo Zero and the AlphaZero algorithms?

In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up. I…
6
votes
1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…
6
votes
1 answer

Clarifying representation of Neural Nerwork input for Chess Alpha Zero

In the Alpha Zero paper (https://arxiv.org/pdf/1712.01815.pdf) page 13, the input for the NN is described. In the beggining of the page, the authors state that: "The input to the Neural Network is an N x X x (MT + L) image stack [...]" From this, I…
Andrew
  • 161
  • 6
5
votes
2 answers

What part of the game is the value network trained to predict a winner on?

The Alpha Zero (as well as AlphaGo Zero) papers say they trained the value head of the network by "minimizing the error between the predicted winner and the game winner" throughout its many self-play games. As far as I could tell, further…
5
votes
2 answers

Is it practical to train AlphaZero or MuZero (for indie games) on a personal computer?

Is it practical/affordable to train an AlphaZero/MuZero engine using a residential gaming PC, or would it take thousands of years of training for the AI to learn enough to challenge humans? I'm having trouble wrapping my head around how much…
Luke W
  • 53
  • 3
5
votes
1 answer

Do AlphaZero/MuZero learn faster in terms of number of games played than humans?

I don't know much about AI and am just curious. From what I read, AlphaZero/MuZero outperform any human chess player after a few hours of training. I have no idea how many chess games a very talented human chess player on average has played before…
220284
  • 153
  • 4
5
votes
2 answers

What is the difference between DQN and AlphaGo Zero?

I have already implemented a relatively simple DQN on Pacman. Now I would like to clearly understand the difference between a DQN and the techniques used by AlphaGo zero/AlphaZero and I couldn't find a place where the features of both approaches are…
1
2 3 4 5 6