For questions related to the Monte Carlo methods in reinforcement learning and other AI sub-fields. ("Monte Carlo" refers to random sampling of the search space.)
Questions tagged [monte-carlo-methods]
86 questions
25
votes
2 answers
What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?
I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally.
So, what difference does the second point in both the slides refer to?
user9947
17
votes
1 answer
How does "Monte-Carlo search" work?
I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm.
So, can someone give an easy-to-understand explanation of how the Monte-Carlo…
Dawny33
- 1,381
- 13
- 29
11
votes
3 answers
What is the intuition behind TD($\lambda$)?
I'd like to better understand temporal-difference learning. In particular, I'm wondering if it is prudent to think about TD($\lambda$) as a type of "truncated" Monte Carlo learning?
Nick Kunz
- 165
- 1
- 7
9
votes
1 answer
MCTS: How to choose the final action from the root
When the time allotted to Monte Carlo tree search runs out, what action should be chosen from the root?
The original UCT paper (2006) says bestAction in their algorithm.
Monte-Carlo Tree Search: A New Framework for Game AI (2008) says
The game…
user76284
- 375
- 2
- 15
8
votes
1 answer
How to fill in missing transitions when sampling an MDP transition table?
I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space.
Suppose I am estimating the transition table for an MDP by running a large number of…
Brendan Hill
- 263
- 1
- 6
6
votes
1 answer
In MCTS, what to do if I do not want to simulate till the end of the game?
I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…
Sami
- 163
- 4
6
votes
1 answer
Why do we need importance sampling?
I was studying the off-policy policy improvement method. Then I encountered importance sampling. I completely understood the mathematics behind the calculation, but I am wondering what is the practical example of importance sampling.
For instance,…
Alireza Hosseini
- 61
- 3
6
votes
1 answer
Why does TD Learning require Markovian domains?
One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
stoic-santiago
- 1,201
- 9
- 22
6
votes
2 answers
How can we compute the ratio between the distributions if we don't know one of the distributions?
Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…
pecey
- 353
- 2
- 10
5
votes
1 answer
How do temporal-difference and Monte Carlo methods work, if they do not have access to model?
In value iteration, we have a model of the environment's dynamics, i.e $p(s', r \mid s, a)$, which we use to update an estimate of the value function.
In the case of temporal-difference and Monte Carlo methods, we do not use $p(s', r \mid s, a)$,…
strongguy122
- 51
- 1
4
votes
1 answer
Why is GLIE Monte-Carlo control an on-policy control?
In slide 16 of his lecture 5 of the course "Reinforcement Learning", David Silver introduced GLIE Monte-Carlo Control.
But why is it an on-policy control? The sampling follows a policy $\pi$ while improvement follows an $\epsilon$-greedy policy, so…
fish_tree
- 247
- 2
- 6
4
votes
2 answers
What does "first-visit" actually mean in Monte Carlo First Visit implementation
Note: I will use FV abbreviation for first-visit and EV for every-visit.
I am reading the famous Barto Sutton Reinforcement Learning book (second edition), and after reading the following exercise, I stumbled upon a thought I can't seem to find a…
Silidrone
- 143
- 5
4
votes
1 answer
What is the typical AI approach for solving blackjack?
I'm currently developing a blackjack program. Now, I want to create an AI that essentially uses the mathematics of blackjack to make decisions.
So, what is the typical AI approach for solving blackjack?
It doesn't have to be language-specific, but…
James Corbett
- 53
- 5
4
votes
1 answer
How does Monte-Carlo Tree Search Compare to MCMC?
Monte-Carlo Tree Search was the method used for AlphaGo my understanding is: it would randomly search the state space of possible moves where the probability of choosing a move was proportional to the perceived Value of the resulting state (this…
profPlum
- 496
- 2
- 10
4
votes
2 answers
Why is the target called "target" in Monte Carlo and TD learning if it is not the true target?
I was going through Sutton's book and, using sample-based learning for estimating the expectations, we have this formula
$$
\text{new estimate} = \text{old estimate} + \alpha(\text{target} - \text{old estimate})
$$
What I don't quite understand is…
Chukwudi Ogbonna
- 125
- 5