Why is it hard to prove the convergence of the deep Q-learning algorithm?

Question

Why is it hard to prove the convergence of the DQN algorithm? We know that the tabular Q-learning algorithm converges to the optimal Q-values, and with a linear approximator convergence is proved.

The main difference of DQN compared to Q-Learning with linear approximator is using DNN, the experience replay memory, and the target network. Which of these components causes the issue and why?

score 1 · Answer 1 · answered Dec 15 '24 at 11:41

1

It is hard to prove because the claim is not true. The DQN algorithm is not guaranteed to converge. For a proof, and a modified algorithm C-DQN that does converge, see the paper by Wang and Ueda.

answered Dec 15 '24 at 11:41

M.C. Escher

11
1

Why is it hard to prove the convergence of the deep Q-learning algorithm?

1 Answers1

Linked