7

In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up.

I have read the AlphaZero paper (and the AlphaGo Zero too) and I didn't find any information on how the rollout is implemented (maybe I missed it).

How is the rollout from the MCTS implemented in both the AlphaGo Zero and the AlphaZero algorithms?

nbro
  • 42,615
  • 12
  • 119
  • 217
ihavenoidea
  • 265
  • 2
  • 11

0 Answers0