Why are policy gradients popular in RL when there exists a dual LP formulation in terms of occupation measures that can be solved easily?

Question

Why are policy gradient methods popular in reinforcement learning when there exists a dual LP formulation in terms of occupation measures that can be solved easily?

score 0 · Answer 1 · answered Oct 16 '22 at 21:59

Policy gradient methods are popular in reinforcement learning because they are fast and easy to implement. Additionally, policy gradient methods often work well for simple problems. for example, if the Hellinger distance between two measures is small.

Dual LP methods may be more accurate if the problem is not too simple. However, they can be more difficult to implement and may require more computational resources. they are preferable if you have more information about the underlying distribution of the data.

Why are policy gradients popular in RL when there exists a dual LP formulation in terms of occupation measures that can be solved easily?

1 Answers1