What exactly is non-delusional Q-learning?

Asked Oct 15 '22 at 20:38

Active Dec 30 '22 at 15:38

Viewed 101 times

Problems occur when we combine Q-learning with a function approximator.

What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration.

I have trouble understand the term "Policy Commitments", and "consistency". What are they talking about?

PS: a related post

edited Dec 30 '22 at 15:38

nbro

asked Oct 15 '22 at 20:38

wrek

0 Answers0