3

Why is it hard to prove the convergence of the DQN algorithm? We know that the tabular Q-learning algorithm converges to the optimal Q-values, and with a linear approximator convergence is proved.

The main difference of DQN compared to Q-Learning with linear approximator is using DNN, the experience replay memory, and the target network. Which of these components causes the issue and why?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

1

It is hard to prove because the claim is not true. The DQN algorithm is not guaranteed to converge. For a proof, and a modified algorithm C-DQN that does converge, see the paper by Wang and Ueda.