Questions tagged [dyna]

For questions related to the reinforcement learning "dyna" architecture.

For more info, have a look e.g. at https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node29.html.

7 questions
2
votes
0 answers

How do I know if the assumption of a static environment is made?

An important property of a reinforcement learning problem is whether the environment of the agent is static, which means that nothing changes if the agent remains inactive. Different learning methods assume in varying degrees that the environment is…
1
vote
1 answer

In Dyna Q, why is only one step learned during the first episode

In explanation of Example 8.1, Sutton and Barto's book says Without planning $(n = 0)$, each episode adds only one additional step to the policy, and so only one step (the last) has been learned so far. With planning, again only one step is learned…
Lazy Guy
  • 13
  • 3
1
vote
1 answer

Without planning, why does each episode only add one additional step to the policy?

In Sutton & Barto's RL book at page 165 for Example 8.1, they say: Figure 8.3 shows why the planning agents found the solution so much faster than the nonplanning agent. Shown are the policies found by the n = 0 and n = 50 agents halfway through…
DSPinfinity
  • 1,223
  • 4
  • 10
1
vote
1 answer

If $\alpha$ decreases over time, why is Q-learning guaranteed to converge?

Q-Learning is guaranteed to converge if $\alpha$ decreases over time. On page 161 of the RL book by Sutton and Barto, 2nd edition, section 8.1, they write that Dyna-Q is guaranteed to converge if each action-state pair is selected an infinite number…
1
vote
1 answer

How is trajectory sampling different than normal (importance) sampling in reinforcement learning?

I am using Sutton and Barto's book for Reinforcement Learning. In Chapter 8, I am having difficulty in understanding the Trajectory Sampling. I have read the particular section on trajectory sampling (Sec 8.6) two times (plus 3rd time partially) but…
SJa
  • 393
  • 3
  • 17
1
vote
0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…
0
votes
0 answers

Reinforcement Learning for Discrete State and Continuous Action Spaces

I was looking for a way to apply reinforcement learning to a discrete state and continuous action space, specifically algorithms and common methods of approaching this type of problem. I have tried applying a version of DynaQ to two environments:…
Aidan
  • 1