How to implement REINFORCE with eligibility traces?

Asked Jan 20 '21 at 07:43

Active Jan 20 '21 at 12:32

Viewed 340 times

The pseudocode below is taken from Barto and Sutton's "Reinforcement Learning: an introduction". It shows an actor-critic implementation with eligibility traces. My question is: if I set $\lambda^{\theta}=1$ and replace $\delta$ with the immediate reward $R_t$, do I get a backwards implementation of REINFORCE?

edited Jan 20 '21 at 12:32

nbro

42,615
12
119
217

asked Jan 20 '21 at 07:43

Javier Ventajas Hernández

How to implement REINFORCE with eligibility traces?

0 Answers0