5

Problems occur when we combine Q-learning with a function approximator.

What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration.

I have trouble understand the term "Policy Commitments", and "consistency". What are they talking about?

PS: a related post

nbro
  • 42,615
  • 12
  • 119
  • 217
wrek
  • 183
  • 4

0 Answers0