Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?
Asked
Active
Viewed 3,852 times
1 Answers
3
A Bellman backup is an application of a Bellman operator. For example, the step
$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$
Is a Bellman backup for some learning rate $\alpha$.
A Bellman error is
$$ d(V(x), R + \mathbf{E}[V(x')]) $$
for some metric $d$, usually $d(x, y) = (x-y)^2$.
harwiltz
- 1,166
- 1
- 8
- 6