How to estimate the error during training in deep reinforcement learning

Question

How do I calculate the error during the training phase for deep reinforcement learning models?

Deep reinforcement learning is not supervised learning as far as I know. So how can the model know whether it predicts right or wrong? In literature, I find that the "actual" Q-value is calculated, but that sounds like the whole idea behind deep RL is obsolete. How could I even calculate/know the real Q-value if there is not already a world model existing?

score 1 · Answer 1 · answered Feb 19 '20 at 13:44

Yes, reinforcement learning is very different from supervised learning, the policy (what you call a model) does not know if its predicting right or wrong, or if its taking the correct action or not. In RL there is no concept of "the right action", everything is evaluated through the reward function.

Also there are no ways to compute the ground truth Q-values, if you had that then there is no need to do RL.

In RL you should not think like in supervised learning, there are no error metrics, everything is evaluated on how much accumulated reward the agent receives over an episode.

How to estimate the error during training in deep reinforcement learning

1 Answers1