Questions tagged [monte-carlo-methods]

For questions related to the Monte Carlo methods in reinforcement learning and other AI sub-fields. ("Monte Carlo" refers to random sampling of the search space.)

86 questions
25
votes
2 answers

What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?

I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally. So, what difference does the second point in both the slides refer to?
17
votes
1 answer

How does "Monte-Carlo search" work?

I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm. So, can someone give an easy-to-understand explanation of how the Monte-Carlo…
Dawny33
  • 1,381
  • 13
  • 29
11
votes
3 answers

What is the intuition behind TD($\lambda$)?

I'd like to better understand temporal-difference learning. In particular, I'm wondering if it is prudent to think about TD($\lambda$) as a type of "truncated" Monte Carlo learning?
9
votes
1 answer

MCTS: How to choose the final action from the root

When the time allotted to Monte Carlo tree search runs out, what action should be chosen from the root? The original UCT paper (2006) says bestAction in their algorithm. Monte-Carlo Tree Search: A New Framework for Game AI (2008) says The game…
8
votes
1 answer

How to fill in missing transitions when sampling an MDP transition table?

I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space. Suppose I am estimating the transition table for an MDP by running a large number of…
6
votes
1 answer

In MCTS, what to do if I do not want to simulate till the end of the game?

I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…
6
votes
1 answer

Why do we need importance sampling?

I was studying the off-policy policy improvement method. Then I encountered importance sampling. I completely understood the mathematics behind the calculation, but I am wondering what is the practical example of importance sampling. For instance,…
6
votes
1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
6
votes
2 answers

How can we compute the ratio between the distributions if we don't know one of the distributions?

Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…
5
votes
1 answer

How do temporal-difference and Monte Carlo methods work, if they do not have access to model?

In value iteration, we have a model of the environment's dynamics, i.e $p(s', r \mid s, a)$, which we use to update an estimate of the value function. In the case of temporal-difference and Monte Carlo methods, we do not use $p(s', r \mid s, a)$,…
4
votes
1 answer

Why is GLIE Monte-Carlo control an on-policy control?

In slide 16 of his lecture 5 of the course "Reinforcement Learning", David Silver introduced GLIE Monte-Carlo Control. But why is it an on-policy control? The sampling follows a policy $\pi$ while improvement follows an $\epsilon$-greedy policy, so…
4
votes
2 answers

What does "first-visit" actually mean in Monte Carlo First Visit implementation

Note: I will use FV abbreviation for first-visit and EV for every-visit. I am reading the famous Barto Sutton Reinforcement Learning book (second edition), and after reading the following exercise, I stumbled upon a thought I can't seem to find a…
4
votes
1 answer

What is the typical AI approach for solving blackjack?

I'm currently developing a blackjack program. Now, I want to create an AI that essentially uses the mathematics of blackjack to make decisions. So, what is the typical AI approach for solving blackjack? It doesn't have to be language-specific, but…
4
votes
1 answer

How does Monte-Carlo Tree Search Compare to MCMC?

Monte-Carlo Tree Search was the method used for AlphaGo my understanding is: it would randomly search the state space of possible moves where the probability of choosing a move was proportional to the perceived Value of the resulting state (this…
4
votes
2 answers

Why is the target called "target" in Monte Carlo and TD learning if it is not the true target?

I was going through Sutton's book and, using sample-based learning for estimating the expectations, we have this formula $$ \text{new estimate} = \text{old estimate} + \alpha(\text{target} - \text{old estimate}) $$ What I don't quite understand is…
1
2 3 4 5 6