5

Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?

nbro
  • 42,615
  • 12
  • 119
  • 217
user529295
  • 379
  • 2
  • 12

1 Answers1

3

A Bellman backup is an application of a Bellman operator. For example, the step

$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$

Is a Bellman backup for some learning rate $\alpha$.

A Bellman error is

$$ d(V(x), R + \mathbf{E}[V(x')]) $$

for some metric $d$, usually $d(x, y) = (x-y)^2$.

harwiltz
  • 1,166
  • 1
  • 8
  • 6