What strategies are there to reduce the variance of the policy gradient estimator of the REINFORCE algorithm?

Asked Nov 24 '22 at 00:57

Active Nov 01 '23 at 17:11

Viewed 99 times

I know one possibility is to subtract a baseline as a running average of rewards from past mini-batches. Another is to compute the mean and variance of each trajectory over one mini-batch and standardise the values. A third one is to use large batch sizes.

What is considered the most effective? What other methods are there?

edited Dec 11 '22 at 12:49

nbro

42,615
12
119
217

asked Nov 24 '22 at 00:57

postnubilaphoebus

What strategies are there to reduce the variance of the policy gradient estimator of the REINFORCE algorithm?

0 Answers0