4

I am looking at the different existing methods of action selection in reinforcement learning.

I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling.

I managed to understand the principle of each method except Thompson sampling.

I can't understand the principle and the way it works and its action selection steps.

If you can explain to me the principle and the functioning of Thompson sampling with a simple example I would be grateful.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66

0 Answers0