For questions related to the dynamic programming paradigm in the context of AI (and, in particular, reinforcement learning).
Questions tagged [dynamic-programming]
35 questions
11
votes
3 answers
What algorithms are considered reinforcement learning algorithms?
What are the areas/algorithms that belong to reinforcement learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Are the dynamic programming…
Miguel Saraiva
- 797
- 1
- 5
- 15
6
votes
1 answer
Why does TD Learning require Markovian domains?
One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
stoic-santiago
- 1,201
- 9
- 22
4
votes
2 answers
What is the relation between Dynamic Programming and Reinforcement Learning?
Please forgive me for the implicity of the question, as I recently started studying Reinforcement Learning.
I am supposed to study a system where the transition probabilities are known and I have to use Reinforcement Learning.
I try to unterstand…
Annaassymeon2
- 41
- 1
- 3
4
votes
1 answer
Why is update rule of the value function different in policy evaluation and policy iteration?
In the textbook "Reinforcement Learning: An Introduction", by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows:
The update equation for $V(s)$ comes from the Bellman equation for $v_{\pi}(s)$ which is…
Jarvis1997
- 157
- 6
3
votes
1 answer
Do we need the transition probability function when calculating the importance sampling ratio?
I am reading the book titled "Reinforcement Learning: An Introduction" (by Sutton and Barto). I am at chapter 5, which is about Monte Carlo methods, but now I am quite confused.
There is one thing I don't particularly understand. Why do we need the…
Manuel Pasieka
- 33
- 4
3
votes
1 answer
Are policy and value iteration used only in grid world like scenarios?
I am trying to self learn reinforcement learning. At the moment I am focusing on policy and value iteration, and I am finding several problems and doubts.
One of the main doubts is given by the fact that I can't find many diversified examples on how…
dcr
- 57
- 5
3
votes
1 answer
What are the common techniques one could use to deal with collisions in a transposition table?
Consider an iterative deepening search using a transposition table. Whenever the transposition table is full, what are common strategies applied to replace entries in the table?
I'm aware of two strategies:
Replace the oldest entry whenever a new…
ihavenoidea
- 265
- 2
- 11
2
votes
2 answers
Should we feed a greater fraction of terminal states to the value network so that their values are learned first?
The basis of Q-learning is recursive (similar to dynamic programming), where only the absolute value of the terminal state is known.
Shouldn't it make sense to feed the model a greater proportion of terminal states initially, to ensure that the…
Tejas Ramdas
- 605
- 1
- 6
- 6
2
votes
1 answer
Where does the TD formula for tic-tac-toe in Sutton & Barto come from?
In section $1.5$ of the book "Reinforcement Learning: An Introduction" by Sutton and Barto they use tic-tac-toe as an example of an RL use case. They provide the following temporal difference update rule in that section:
$$
V(S_{t}) \leftarrow…
mNugget
- 167
- 4
2
votes
0 answers
Why solely a one-step-lookahead in value/policy-iteration?
In value iteration and policy iteration we solely consider a one-step-lookahead where the lookahead is from the previous iteraiton and therefore need to sweep over all states and iterate this procedure.
Why dont we do one sweep over all states but…
hugh
- 53
- 3
2
votes
0 answers
What trait of a planning problem makes reinforcement learning a well suited solution?
Planning problems have been the first problems studied at the dawn of AI (Shakey the robot). Graph search (e.g. A*) and planning (e.g. GraphPlan) algorithms can be very efficient at generating a plan. As for problem formulation, for planning…
50k4
- 225
- 1
- 8
2
votes
1 answer
Is it possible to retrieve the optimal policy from the state value function?
One can easily retrieve the optimal policy from the action value function but how about obtaining it from the state value function?
Mika
- 371
- 2
- 10
1
vote
1 answer
Why are there up to $m^2$ action values when we consider the complexity of DP based on $q(x,u)$?
Please see slide 32 in the following lecture slides on DP:
https://groups.uni-paderborn.de/lea/share/lehre/reinforcementlearning/lecture_slides/built/Lecture03.pdf
Let $m$ the size size of action space in MPD.
Why are there up to $m^2$ action values…
DSPinfinity
- 1,223
- 4
- 10
1
vote
2 answers
Why do exhaustive search require 14 travel segment evaluations but dynamic programming require 10 for this shortest path problem?
Why do exhaustive search require 14 travel segment evaluations but dynamic programming require 10 for this shortest path problem? I need a clear explanation.
DSPinfinity
- 1,223
- 4
- 10
1
vote
0 answers
How to modify the bellman operator for in-place iterative policy evaluation?
The iterative update rule for policy evaluation that is, approximating the value function for a given policy is:
$$v^{k+1} = r_{\pi} + \gamma P_{\pi}v^{k}$$
This is the simultaneous update rule where the new values of the value function vector are…
Atharva
- 11
- 2