In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up.
I have read the AlphaZero paper (and the AlphaGo Zero too) and I didn't find any information on how the rollout is implemented (maybe I missed it).
How is the rollout from the MCTS implemented in both the AlphaGo Zero and the AlphaZero algorithms?
 
    