0

Why are policy gradient methods popular in reinforcement learning when there exists a dual LP formulation in terms of occupation measures that can be solved easily?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

0

Policy gradient methods are popular in reinforcement learning because they are fast and easy to implement. Additionally, policy gradient methods often work well for simple problems. for example, if the Hellinger distance between two measures is small.

Dual LP methods may be more accurate if the problem is not too simple. However, they can be more difficult to implement and may require more computational resources. they are preferable if you have more information about the underlying distribution of the data.

Faizy
  • 1,144
  • 1
  • 8
  • 30