Questions tagged [eligibility-traces]

For questions related to the reinforcement learning technique called "eligibility traces", which combines temporal-difference and Monte Carlo methods.

16 questions
6
votes
1 answer

Can TD($\lambda$) be used with deep reinforcement learning?

TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo. Reading the link above, I see that an eligibility trace is kept for each state in order…
5
votes
1 answer

Why not more TD() in actor-critic algorithms?

Is there either an empirical or theoretical reason that actor-critic algorithms with eligibility traces have not been more fully explored? I was hoping to find a paper or implementation or both for continuous tasks (not episodic) in continuous…
4
votes
1 answer

How to apply or extend the $Q(\lambda)$ algorithm to semi-MDPs?

I want to model an SMDP such that time is discretized and the transition time between the two states follows an exponential distribution and there would be no reward between the transition. Can I know what are the differences between $Q(\lambda)$…
3
votes
1 answer

Derivation of Sutton & Barto TD(λ) Weight Update Equation with Eligibility Traces

I'm working through Sutton & Barto's Reinforcement Learning: An Introduction, 2nd edition, and trying to understand the derivation of Equation 12.7 for TD(λ) weight updates in Chapter 12, specifically when using eligibility traces. Here’s the update…
3
votes
1 answer

Do eligibility traces and epsilon-greedy do the same task in different ways?

I understand that in Reinforcement Learning algorithms, such as Q-learning, in order to prevent selecting the actions with greatest q-values too fast and allow for exploration, we use eligibility traces. Does $\epsilon$-greedy solve the same…
3
votes
0 answers

How to implement REINFORCE with eligibility traces?

The pseudocode below is taken from Barto and Sutton's "Reinforcement Learning: an introduction". It shows an actor-critic implementation with eligibility traces. My question is: if I set $\lambda^{\theta}=1$ and replace $\delta$ with the immediate…
3
votes
0 answers

Why weighting by lambda that sums to 1 ensures convergence in eligibility trace?

In Sutton and Barto's Book in chapter 12, they state that if weights sum to 1, then an equation's updates have "guaranteed convergence properties". Actually why it ensures convergence? There is a full citation from the mentioned fragment in Richard…
Daniel Wiczew
  • 343
  • 2
  • 11
3
votes
1 answer

How do I derive the gradient with respect to the parameters of the softmax policy?

The gradient of the softmax eligibility trace is given by the following: \begin{align} \nabla_{\theta} \log(\pi_{\theta}(a|s)) &= \phi(s,a) - \mathbb E[\phi (s, \cdot)]\\ &= \phi(s,a) - \sum_{a'} \pi(a'|s) \phi(s,a') \end{align} How is this equation…
2
votes
1 answer

Watkins' Q(λ) with function approximation: why is gradient not considered when updating eligibility traces for the exploitation phase?

I'm implementing the Watkins' Q(λ) algorithm with function approximation (in 2nd edition of Sutton & Barto). I am very confused about updating the eligibility traces because, at the beginning of chapter 9.3 "Control with Function Approximation",…
2
votes
1 answer

How to prove the formula of eligibility traces operator in reinforcement learning?

I don't understand how the formula in the red circle is derived. The screenshot is taken from this paper
2
votes
1 answer

How can the $\lambda$-return be defined recursively?

The $\lambda$-return is defined as $$G_t^\lambda = (1-\lambda)\sum_{n=1}^\infty \lambda^{n-1}G_{t:t+n}$$ where $$G_{t:t+n} = R_{t+1}+\gamma R_{t+2}+\dots +\gamma^{n-1}R_{t+n} + \gamma^n\hat{v}(S_{t+n})$$ is the $n$-step return from time $t$. How can…
1
vote
0 answers

What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

I am watching the lecture from Brown University (in udemy) and I am in the portion of Temporal Difference Learning. In the pseudocode/algorithm of TD(1) (seen in the screenshot below), we initialise the eligibility $e(s) =0$ for all states. Later…
1
vote
1 answer

Applying Eligibility Traces to Q-Learning algorithm does not improve results (And might not function well)

I am trying to apply Eligibility Traces to a currently working Q-Learning algorithm. The reference code for the Q-Learning algorithm was taken from this great blog by DeepLizard, but does not include Eligibility Traces. Link to the code on Google…
1
vote
0 answers

How is the general return-based off-policy equation derived?

I'm wondering how is the general return-based off-policy equation in Safe and efficient off-policy reinforcement learning derived $$\mathcal{R} Q(x, a):=Q(x, a)+\mathbb{E}_{\mu}\left[\sum_{t \geq 0} \gamma^{t}\left(\prod_{s=1}^{t}…
1
vote
0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…
1
2