3

I have a question about whether the Deterministic Policy Gradient algorithm in it's basic form is policy-based or actor-critic. I have been searching for the answer for a while and in some cases it says it's policy-based (like in a post here a while ago), whereas in others it does not explicitly says it's an actor-critic, but that it uses an actor-critic framework to optmize the policy.

I know that actor-critic methods are essentially policy-based methods augmented with a critic to improve learning efficiency and stability.

I'm asking this question as well because I'm trying to make a classification of RL algorithms for a project of my own. I attach the table, you can point out any mistakes I might have made or things that could be improved:

enter image description here

1 Answers1

3

Indeed DPG is essentially a policy-based method because it directly optimizes the policy actor by taking the gradient of the expected return w.r.t. its parameters and also is deterministic as opposed to sampling from a probability distribution. DPG also includes a critic typically implemented as a Q-function approximator which is essential because it enables the actor to compute gradients through the Q-function allowing it to improve by following the deterministic policy gradient. Since the critic here reduces variance in policy updates making learning more sample-efficient than pure policy-based methods that rely only on rewards from the environment, DPG belongs to the actor-critic method sub-category of the policy based method category following your classification scheme.

cinch
  • 11,000
  • 3
  • 8
  • 17