What is the proof that "reward-to-go" reduces variance of policy gradient?

Asked Jun 10 '20 at 13:38

Active Oct 10 '20 at 15:51

Viewed 543 times

I am following the OpenAI's spinning up tutorial Part 3: Intro to Policy Optimization. It is mentioned there that the reward-to-go reduces the variance of the policy gradient. While I understand the intuition behind it, I struggle to find a proof in the literature.

edited Oct 10 '20 at 15:51

nbro

42,615
12
119
217

asked Jun 10 '20 at 13:38

sirKris van Dela

What is the proof that "reward-to-go" reduces variance of policy gradient?

0 Answers0

Linked