1

In my QR-DQN application, the resulting quantiles for a state s and action a take the form of the blue line in the figure. The method works well in expected values and trains effectively. However, I know that in my problem the return distribution for (s, a) is multinomial with PMF only at {-2, +2}. Thus, I would expect the quantiles to take the form of the green line.

Is my expectation of the resulting quantiles correct? If yes, why does QR-DDQN behave this way?

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217
amavrits
  • 11
  • 1

0 Answers0