Highest Voted 'dynamic-programming' Questions - Artificial Intelligence Stack Exchange

11

votes

3 answers

What algorithms are considered reinforcement learning algorithms?

What are the areas/algorithms that belong to reinforcement learning? TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it? Are the dynamic programming…

asked Apr 30 '19 at 14:26

Miguel Saraiva

797
1
5
15

6

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

reinforcement-learning monte-carlo-methods temporal-difference-methods markov-property dynamic-programming

asked Aug 07 '20 at 05:19

stoic-santiago

1,201
9
22

4

votes

2 answers

What is the relation between Dynamic Programming and Reinforcement Learning?

Please forgive me for the implicity of the question, as I recently started studying Reinforcement Learning. I am supposed to study a system where the transition probabilities are known and I have to use Reinforcement Learning. I try to unterstand…

reinforcement-learning dynamic-programming

asked Nov 13 '23 at 21:27

Annaassymeon2

41
1
3

4

votes

1 answer

Why is update rule of the value function different in policy evaluation and policy iteration?

In the textbook "Reinforcement Learning: An Introduction", by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows: The update equation for $V(s)$ comes from the Bellman equation for $v_{\pi}(s)$ which is…

reinforcement-learning comparison policy-iteration policy-evaluation dynamic-programming

asked May 19 '20 at 06:08

Jarvis1997

157
6

3

votes

1 answer

Do we need the transition probability function when calculating the importance sampling ratio?

I am reading the book titled "Reinforcement Learning: An Introduction" (by Sutton and Barto). I am at chapter 5, which is about Monte Carlo methods, but now I am quite confused. There is one thing I don't particularly understand. Why do we need the…

reinforcement-learning on-policy-methods monte-carlo-methods importance-sampling dynamic-programming

asked Nov 16 '18 at 13:02

Manuel Pasieka

33
4

3

votes

1 answer

Are policy and value iteration used only in grid world like scenarios?

I am trying to self learn reinforcement learning. At the moment I am focusing on policy and value iteration, and I am finding several problems and doubts. One of the main doubts is given by the fact that I can't find many diversified examples on how…

reinforcement-learning value-iteration policy-iteration dynamic-programming

asked Jun 15 '21 at 10:39

dcr

57
5

3

votes

1 answer

What are the common techniques one could use to deal with collisions in a transposition table?

Consider an iterative deepening search using a transposition table. Whenever the transposition table is full, what are common strategies applied to replace entries in the table? I'm aware of two strategies: Replace the oldest entry whenever a new…

search dynamic-programming iddfs

asked Mar 22 '19 at 21:54

ihavenoidea

265
2
11

2

votes

2 answers

Should we feed a greater fraction of terminal states to the value network so that their values are learned first?

The basis of Q-learning is recursive (similar to dynamic programming), where only the absolute value of the terminal state is known. Shouldn't it make sense to feed the model a greater proportion of terminal states initially, to ensure that the…

reinforcement-learning q-learning markov-decision-process dynamic-programming

asked May 02 '18 at 08:08

Tejas Ramdas

605
1
6
6

2

votes

1 answer

Where does the TD formula for tic-tac-toe in Sutton & Barto come from?

In section $1.5$ of the book "Reinforcement Learning: An Introduction" by Sutton and Barto they use tic-tac-toe as an example of an RL use case. They provide the following temporal difference update rule in that section: $$ V(S_{t}) \leftarrow…

reinforcement-learning sutton-barto temporal-difference-methods dynamic-programming tic-tac-toe

asked Apr 11 '24 at 02:34

mNugget

167
4

2

votes

0 answers

Why solely a one-step-lookahead in value/policy-iteration?

In value iteration and policy iteration we solely consider a one-step-lookahead where the lookahead is from the previous iteraiton and therefore need to sweep over all states and iterate this procedure. Why dont we do one sweep over all states but…

bellman-equations value-iteration policy-iteration dynamic-programming tree-search

asked Sep 25 '23 at 20:53

hugh

53
3

2

votes

0 answers

What trait of a planning problem makes reinforcement learning a well suited solution?

Planning problems have been the first problems studied at the dawn of AI (Shakey the robot). Graph search (e.g. A*) and planning (e.g. GraphPlan) algorithms can be very efficient at generating a plan. As for problem formulation, for planning…

reinforcement-learning markov-decision-process planning dynamic-programming

asked Feb 08 '21 at 13:57

50k4

225
1
8

2

votes

1 answer

Is it possible to retrieve the optimal policy from the state value function?

One can easily retrieve the optimal policy from the action value function but how about obtaining it from the state value function?

reinforcement-learning dynamic-programming

asked Nov 07 '20 at 01:46

Mika

371
2
10

1

vote

1 answer

Why are there up to $m^2$ action values when we consider the complexity of DP based on $q(x,u)$?

Please see slide 32 in the following lecture slides on DP: https://groups.uni-paderborn.de/lea/share/lehre/reinforcementlearning/lecture_slides/built/Lecture03.pdf Let $m$ the size size of action space in MPD. Why are there up to $m^2$ action values…

reinforcement-learning markov-decision-process dynamic-programming

asked Mar 19 '24 at 01:47

DSPinfinity

1,223
4
10

1

vote

2 answers

Why do exhaustive search require 14 travel segment evaluations but dynamic programming require 10 for this shortest path problem?

Why do exhaustive search require 14 travel segment evaluations but dynamic programming require 10 for this shortest path problem? I need a clear explanation.

reinforcement-learning markov-decision-process dynamic-programming shortest-path-problem

asked Mar 15 '24 at 02:07

DSPinfinity

1,223
4
10

1

vote

0 answers

How to modify the bellman operator for in-place iterative policy evaluation?

The iterative update rule for policy evaluation that is, approximating the value function for a given policy is: $$v^{k+1} = r_{\pi} + \gamma P_{\pi}v^{k}$$ This is the simultaneous update rule where the new values of the value function vector are…

reinforcement-learning policy-iteration dynamic-programming policy-evaluation

asked Mar 13 '24 at 11:17

Atharva

11
2

Questions tagged [dynamic-programming]