4

I have been trying to understand why MCTS is very important to the performance of RL agents, and the best description I found was from the paper Bootstrapping from Game Tree Search stating:

Deterministic, two-player games such as chess provide an ideal test-bed for search bootstrapping. The intricate tactics require a significant level of search to provide an accurate position evaluation; learning without search has produced little success in these domains.

I however don't understand why this is the case, and why value based methods are unable to achieve similar performance.

So my question would be:

  • What are the main advantages of incorporating search based algorithms with value based methods?
Hossam
  • 43
  • 3

1 Answers1

3

Assuming a continuous/uncountable state space, we can only estimate our value function using function approximation, so our estimates will never be true for all states simultaneously (because, loosely speaking, we have far more states than weights). If we can look at the (approximated) value of states we take in, say, 5 actions time, it is better to make a decision based on these estimations, taking into account the true rewards observed after the 5 actions.

Further, MCTS also allows more implicit exploration as when choosing the actions to expand the tree we are potentially choosing lots of non-greedy actions that lead to better future returns.

David
  • 5,100
  • 1
  • 11
  • 33