2

I am following the OpenAI's spinning up tutorial Part 3: Intro to Policy Optimization. It is mentioned there that the reward-to-go reduces the variance of the policy gradient. While I understand the intuition behind it, I struggle to find a proof in the literature.

nbro
  • 42,615
  • 12
  • 119
  • 217

0 Answers0