Highest Voted 'eligibility-traces' Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

Can TD($\lambda$) be used with deep reinforcement learning?

TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo. Reading the link above, I see that an eligibility trace is kept for each state in order…

asked Feb 02 '19 at 17:30

Gulzar

789
1
10
27

5

votes

1 answer

Why not more TD() in actor-critic algorithms?

Is there either an empirical or theoretical reason that actor-critic algorithms with eligibility traces have not been more fully explored? I was hoping to find a paper or implementation or both for continuous tasks (not episodic) in continuous…

reinforcement-learning actor-critic-methods temporal-difference-methods eligibility-traces td-lambda

asked Feb 17 '20 at 06:37

Nick Kunz

165
1
7

4

votes

1 answer

How to apply or extend the $Q(\lambda)$ algorithm to semi-MDPs?

I want to model an SMDP such that time is discretized and the transition time between the two states follows an exponential distribution and there would be no reward between the transition. Can I know what are the differences between $Q(\lambda)$…

reinforcement-learning q-learning semi-mdp eligibility-traces

asked Mar 10 '19 at 20:54

Amin

481
2
12

3

votes

1 answer

Derivation of Sutton & Barto TD(λ) Weight Update Equation with Eligibility Traces

I'm working through Sutton & Barto's Reinforcement Learning: An Introduction, 2nd edition, and trying to understand the derivation of Equation 12.7 for TD(λ) weight updates in Chapter 12, specifically when using eligibility traces. Here’s the update…

reinforcement-learning temporal-difference-methods stochastic-gradient-descent eligibility-traces td-lambda

asked Oct 29 '24 at 12:28

AJR

31
1

3

votes

1 answer

Do eligibility traces and epsilon-greedy do the same task in different ways?

I understand that in Reinforcement Learning algorithms, such as Q-learning, in order to prevent selecting the actions with greatest q-values too fast and allow for exploration, we use eligibility traces. Does $\epsilon$-greedy solve the same…

reinforcement-learning comparison exploration-exploitation-tradeoff epsilon-greedy-policy eligibility-traces

asked Oct 21 '17 at 07:39

Abhishek Dhyani

41
3

3

votes

0 answers

How to implement REINFORCE with eligibility traces?

The pseudocode below is taken from Barto and Sutton's "Reinforcement Learning: an introduction". It shows an actor-critic implementation with eligibility traces. My question is: if I set $\lambda^{\theta}=1$ and replace $\delta$ with the immediate…

reinforcement-learning actor-critic-methods reinforce eligibility-traces

asked Jan 20 '21 at 07:43

Javier Ventajas Hernández

131
2

3

votes

0 answers

Why weighting by lambda that sums to 1 ensures convergence in eligibility trace?

In Sutton and Barto's Book in chapter 12, they state that if weights sum to 1, then an equation's updates have "guaranteed convergence properties". Actually why it ensures convergence? There is a full citation from the mentioned fragment in Richard…

reinforcement-learning eligibility-traces

asked Sep 27 '20 at 10:58

Daniel Wiczew

343
2
11

3

votes

1 answer

How do I derive the gradient with respect to the parameters of the softmax policy?

The gradient of the softmax eligibility trace is given by the following: \begin{align} \nabla_{\theta} \log(\pi_{\theta}(a|s)) &= \phi(s,a) - \mathbb E[\phi (s, \cdot)]\\ &= \phi(s,a) - \sum_{a'} \pi(a'|s) \phi(s,a') \end{align} How is this equation…

reinforcement-learning policy-gradients eligibility-traces

asked May 19 '20 at 05:02

Stephane Hatgis-Kessell

175
4

2

votes

1 answer

Watkins' Q(λ) with function approximation: why is gradient not considered when updating eligibility traces for the exploitation phase?

I'm implementing the Watkins' Q(λ) algorithm with function approximation (in 2nd edition of Sutton & Barto). I am very confused about updating the eligibility traces because, at the beginning of chapter 9.3 "Control with Function Approximation",…

reinforcement-learning q-learning gradient-descent function-approximation eligibility-traces

asked Dec 29 '21 at 17:17

Francesco Vignola

31
2

2

votes

1 answer

How to prove the formula of eligibility traces operator in reinforcement learning?

I don't understand how the formula in the red circle is derived. The screenshot is taken from this paper

reinforcement-learning proofs bellman-equations eligibility-traces

asked Jan 17 '21 at 05:05

hijkzzz

23
3

2

votes

1 answer

How can the $\lambda$-return be defined recursively?

The $\lambda$-return is defined as $$G_t^\lambda = (1-\lambda)\sum_{n=1}^\infty \lambda^{n-1}G_{t:t+n}$$ where $$G_{t:t+n} = R_{t+1}+\gamma R_{t+2}+\dots +\gamma^{n-1}R_{t+n} + \gamma^n\hat{v}(S_{t+n})$$ is the $n$-step return from time $t$. How can…

reinforcement-learning sutton-barto return eligibility-traces

asked Apr 12 '19 at 08:16

Philip Raeisghasem

2,074
12
30

1

vote

0 answers

What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

I am watching the lecture from Brown University (in udemy) and I am in the portion of Temporal Difference Learning. In the pseudocode/algorithm of TD(1) (seen in the screenshot below), we initialise the eligibility $e(s) =0$ for all states. Later…

reinforcement-learning temporal-difference-methods eligibility-traces td-lambda

asked Mar 16 '22 at 05:41

cgo

185
6

1

vote

1 answer

Applying Eligibility Traces to Q-Learning algorithm does not improve results (And might not function well)

I am trying to apply Eligibility Traces to a currently working Q-Learning algorithm. The reference code for the Q-Learning algorithm was taken from this great blog by DeepLizard, but does not include Eligibility Traces. Link to the code on Google…

reinforcement-learning q-learning unsupervised-learning eligibility-traces

asked May 02 '20 at 20:18

Sahar Attia

21
3

1

vote

0 answers

How is the general return-based off-policy equation derived?

I'm wondering how is the general return-based off-policy equation in Safe and efficient off-policy reinforcement learning derived $$\mathcal{R} Q(x, a):=Q(x, a)+\mathbb{E}_{\mu}\left[\sum_{t \geq 0} \gamma^{t}\left(\prod_{s=1}^{t}…

reinforcement-learning papers temporal-difference-methods eligibility-traces td-lambda

asked Nov 16 '19 at 10:56

fish_tree

247
2
6

1

vote

0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…

reinforcement-learning model-based-methods prioritized-sweeping eligibility-traces dyna

asked Jan 22 '19 at 22:16

Amin

481
2
12

Questions tagged [eligibility-traces]