How does Proximal Policy Optimization deal with sparse reward

Asked Mar 04 '23 at 02:33

Active Mar 04 '23 at 02:33

Viewed 702 times

In the original paper, the objective of PPO is as follows:. My question is, how does this objective behave in a sparse reward setting (i.e., reward is only given after a sequence of actions were taken)? In this case we don't have $\hat{A}_{t}$ defined for every $t$.

asked Mar 04 '23 at 02:33

Sam

How does Proximal Policy Optimization deal with sparse reward

0 Answers0