4

Monte-Carlo Tree Search was the method used for AlphaGo my understanding is: it would randomly search the state space of possible moves where the probability of choosing a move was proportional to the perceived Value of the resulting state (this rollout data was then used to train the value function further).

This seems very similar to MCMC to me. Except that it includes a mechanism for observing ground truth likelihood (winners/losers in GO) & updating the likelihood function (aka Value function) in the process.

Is this true? And to what degree in general have Bayesian methods been used in RL? For example have people tried using one of the various specialized MCMC algorithms to improve Monte-Carlo tree search?

Sorry if the question is open-ended/vague.

nbro
  • 42,615
  • 12
  • 119
  • 217
profPlum
  • 496
  • 2
  • 10

1 Answers1

1

There is an element in MCTS, in which you select to explore "promising" directions. You also give a chance for branches that were not explored enough yet. So, you can use for that Bayesian methods (to keep and maintain the distribution of the expected rewards).

Oren
  • 141
  • 3