Highest Voted 'dyna' Questions - Artificial Intelligence Stack Exchange

2

votes

0 answers

How do I know if the assumption of a static environment is made?

An important property of a reinforcement learning problem is whether the environment of the agent is static, which means that nothing changes if the agent remains inactive. Different learning methods assume in varying degrees that the environment is…

asked Jun 17 '19 at 18:51

maven

51
4

1

vote

1 answer

In Dyna Q, why is only one step learned during the first episode

In explanation of Example 8.1, Sutton and Barto's book says Without planning $(n = 0)$, each episode adds only one additional step to the policy, and so only one step (the last) has been learned so far. With planning, again only one step is learned…

reinforcement-learning sutton-barto dyna

asked Dec 31 '24 at 06:41

Lazy Guy

13
3

1

vote

1 answer

Without planning, why does each episode only add one additional step to the policy?

In Sutton & Barto's RL book at page 165 for Example 8.1, they say: Figure 8.3 shows why the planning agents found the solution so much faster than the nonplanning agent. Shown are the policies found by the n = 0 and n = 50 agents halfway through…

reinforcement-learning sutton-barto dyna

asked Mar 20 '22 at 08:32

DSPinfinity

1,223
4
10

1

vote

1 answer

If $\alpha$ decreases over time, why is Q-learning guaranteed to converge?

Q-Learning is guaranteed to converge if $\alpha$ decreases over time. On page 161 of the RL book by Sutton and Barto, 2nd edition, section 8.1, they write that Dyna-Q is guaranteed to converge if each action-state pair is selected an infinite number…

reinforcement-learning q-learning convergence learning-rate dyna

asked Dec 04 '20 at 08:26

user8714896

825
1
9
24

1

vote

1 answer

How is trajectory sampling different than normal (importance) sampling in reinforcement learning?

I am using Sutton and Barto's book for Reinforcement Learning. In Chapter 8, I am having difficulty in understanding the Trajectory Sampling. I have read the particular section on trajectory sampling (Sec 8.6) two times (plus 3rd time partially) but…

reinforcement-learning comparison importance-sampling dyna

asked Jun 28 '20 at 14:10

SJa

393
3
17

1

vote

0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…

reinforcement-learning model-based-methods prioritized-sweeping eligibility-traces dyna

asked Jan 22 '19 at 22:16

Amin

481
2
12

0

votes

0 answers

Reinforcement Learning for Discrete State and Continuous Action Spaces

I was looking for a way to apply reinforcement learning to a discrete state and continuous action space, specifically algorithms and common methods of approaching this type of problem. I have tried applying a version of DynaQ to two environments:…

reinforcement-learning dyna

asked Oct 08 '24 at 09:11

Aidan

1

Questions tagged [dyna]

How do I know if the assumption of a static environment is made?

In Dyna Q, why is only one step learned during the first episode

Without planning, why does each episode only add one additional step to the policy?

If $\alpha$ decreases over time, why is Q-learning guaranteed to converge?

How is trajectory sampling different than normal (importance) sampling in reinforcement learning?

Eligibility trace In Model-based Reinforcement Learning

Reinforcement Learning for Discrete State and Continuous Action Spaces