1

I have read many papers, such as this or this, explaining how external sampling works, but I still don't understand how the algorithm works.

I understand you divide $Q$, which is the set of all terminal histories into subsets $Q_1,..., Q_n$.

What is the probability of reaching some subset $Q_i$? Is it just the product of chance probability, the opponent's probability, and my probability?

As I understand it, the sampling only occurs in the opponent's information sets. How does that work? If there are two players, player 1 strategy is based on what strategy I use.

What happens after you have determined a subset $Q_i$ you want to sample? How many times do you iterate over the subset $Q_i$?

I have searched around and I cannot find any Python code that uses external sampling, but plenty of papers that give formulas, but do not explain the algorithm in detail. So, a Python example of MC-CFR external sampling would probably make it a lot easier for me to understand the algorithm.

nbro
  • 42,615
  • 12
  • 119
  • 217
kpopgirl
  • 11
  • 3

1 Answers1

1

External sampling and outcome sampling are two ways of defining the sets $Q_1, \dots, Q_n$. I think your mistake is that you think of the $Q_i$ as fixed and taken as input in these shampling schemes. It is not the case.

In external sampling, there is as many sets $Q_{\tau}$ as there are pure strategies for the opponent and the chance player (a pure strategy is a deterministic policy). Think of it as "the set of terminal nodes that I can reach if my opponent and chance play in this fixed way".

Sampling a set $Q_{\tau}$ thus means sampling a pure strategy for the opponent and chance node. An alternative is to sample on the fly the opponent's policy $\sigma^{-i}$ (and chance) when needed. It gives identical probabilities for q(z), while does not correspond to a full definition of a deterministic policy $\tau$ (which would need a definition on all the information nodes).

There are several implementations on Github, one can be found here: https://github.com/bakanaouji/cpp-cfr/blob/master/RegretMinimization/Trainer/Trainer.cpp

Phil
  • 31
  • 3