Highest Voted 'monte-carlo-methods' Questions - Artificial Intelligence Stack Exchange

25

votes

2 answers

What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?

I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally. So, what difference does the second point in both the slides refer to?

asked Feb 22 '19 at 09:28

user9947

17

votes

1 answer

How does "Monte-Carlo search" work?

I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm. So, can someone give an easy-to-understand explanation of how the Monte-Carlo…

game-ai monte-carlo-tree-search monte-carlo-methods alphago

asked Aug 05 '16 at 07:03

Dawny33

1,381
13
29

11

votes

3 answers

What is the intuition behind TD($\lambda$)?

I'd like to better understand temporal-difference learning. In particular, I'm wondering if it is prudent to think about TD($\lambda$) as a type of "truncated" Monte Carlo learning?

reinforcement-learning comparison monte-carlo-methods temporal-difference-methods td-lambda

asked Jan 21 '20 at 22:17

Nick Kunz

165
1
7

9

votes

1 answer

MCTS: How to choose the final action from the root

When the time allotted to Monte Carlo tree search runs out, what action should be chosen from the root? The original UCT paper (2006) says bestAction in their algorithm. Monte-Carlo Tree Search: A New Framework for Game AI (2008) says The game…

algorithm monte-carlo-tree-search monte-carlo-methods planning tree-search

asked Dec 03 '19 at 04:47

user76284

375
2
15

8

votes

1 answer

How to fill in missing transitions when sampling an MDP transition table?

I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space. Suppose I am estimating the transition table for an MDP by running a large number of…

reinforcement-learning markov-decision-process monte-carlo-methods transition-model

asked Jan 29 '17 at 22:11

Brendan Hill

263
1
6

6

votes

1 answer

In MCTS, what to do if I do not want to simulate till the end of the game?

I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…

monte-carlo-tree-search monte-carlo-methods upper-confidence-bound

asked Apr 05 '21 at 01:28

Sami

163
4

6

votes

1 answer

Why do we need importance sampling?

I was studying the off-policy policy improvement method. Then I encountered importance sampling. I completely understood the mathematics behind the calculation, but I am wondering what is the practical example of importance sampling. For instance,…

reinforcement-learning monte-carlo-methods off-policy-methods importance-sampling

asked Jan 04 '21 at 01:43

Alireza Hosseini

61
3

6

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

reinforcement-learning monte-carlo-methods temporal-difference-methods markov-property dynamic-programming

asked Aug 07 '20 at 05:19

stoic-santiago

1,201
9
22

6

votes

2 answers

How can we compute the ratio between the distributions if we don't know one of the distributions?

Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…

reinforcement-learning monte-carlo-methods importance-sampling

asked May 20 '20 at 21:48

pecey

353
2
10

5

votes

1 answer

How do temporal-difference and Monte Carlo methods work, if they do not have access to model?

In value iteration, we have a model of the environment's dynamics, i.e $p(s', r \mid s, a)$, which we use to update an estimate of the value function. In the case of temporal-difference and Monte Carlo methods, we do not use $p(s', r \mid s, a)$,…

reinforcement-learning monte-carlo-methods temporal-difference-methods model-based-methods model-free-methods

asked Feb 15 '19 at 04:49

strongguy122

51
1

4

votes

1 answer

Why is GLIE Monte-Carlo control an on-policy control?

In slide 16 of his lecture 5 of the course "Reinforcement Learning", David Silver introduced GLIE Monte-Carlo Control. But why is it an on-policy control? The sampling follows a policy $\pi$ while improvement follows an $\epsilon$-greedy policy, so…

reinforcement-learning control-problem on-policy-methods monte-carlo-methods

asked May 22 '18 at 07:57

fish_tree

247
2
6

4

votes

2 answers

What does "first-visit" actually mean in Monte Carlo First Visit implementation

Note: I will use FV abbreviation for first-visit and EV for every-visit. I am reading the famous Barto Sutton Reinforcement Learning book (second edition), and after reading the following exercise, I stumbled upon a thought I can't seem to find a…

reinforcement-learning terminology sutton-barto monte-carlo-methods

asked Jan 02 '25 at 14:57

Silidrone

143
5

4

votes

1 answer

What is the typical AI approach for solving blackjack?

I'm currently developing a blackjack program. Now, I want to create an AI that essentially uses the mathematics of blackjack to make decisions. So, what is the typical AI approach for solving blackjack? It doesn't have to be language-specific, but…

python game-ai algorithm-request monte-carlo-methods

asked Sep 19 '17 at 10:23

James Corbett

53
5

4

votes

1 answer

How does Monte-Carlo Tree Search Compare to MCMC?

Monte-Carlo Tree Search was the method used for AlphaGo my understanding is: it would randomly search the state space of possible moves where the probability of choosing a move was proportional to the perceived Value of the resulting state (this…

reinforcement-learning comparison monte-carlo-tree-search monte-carlo-methods markov-chain-monte-carlo

asked May 11 '23 at 23:20

profPlum

496
2
10

4

votes

2 answers

Why is the target called "target" in Monte Carlo and TD learning if it is not the true target?

I was going through Sutton's book and, using sample-based learning for estimating the expectations, we have this formula $$ \text{new estimate} = \text{old estimate} + \alpha(\text{target} - \text{old estimate}) $$ What I don't quite understand is…

machine-learning reinforcement-learning terminology monte-carlo-methods temporal-difference-methods

asked Aug 28 '20 at 15:19

Chukwudi Ogbonna

125
5

Questions tagged [monte-carlo-methods]