Why is the actor-critic algorithm limited to using on-policy data?

Question

Why is the actor-critic algorithm limited to using on-policy data? Or can we use the actor-critic algorithm with off-policy data?

score 1 · Answer 1 · edited Feb 15 '19 at 19:43

1

It's because, in the actor-critic algorithm, the objective function is an expectation under the $\tau$ of the policy. If we want to use off-policy data, we have to resort to importance sampling relative to the other policy.

edited Feb 15 '19 at 19:43

nbro

42,615
12
119
217

answered Jan 08 '19 at 02:33

apuffin

41
2

Why is the actor-critic algorithm limited to using on-policy data?

1 Answers1