1

I was reading this article https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf and in it there is an algorithm of deep q learning with experience replay as follows:

enter image description here

On line 12, when the algorithm is setting the values for y_j, the second line says:

enter image description here

I'm confused as to what a' refers to and where it comes from.

(Edit) Why on this line (line 7) it's a:

enter image description here

But on line 12 it's a' ?

Can someone please explain it to me?

nbro
  • 42,615
  • 12
  • 119
  • 217
Ness
  • 206
  • 1
  • 8

1 Answers1

2

$r_j + \gamma \max_{a'}Q(\phi_{j+1},a';\theta)$
I'm confused as to what $a'$ refers to and where it comes from.

Here $a'$ is a "dummy" argument over which you perform the maximization operation $\max_{a'}$.

In practice, that would correspond to axis (or dim) argument in numpy/pytorch/tensorflow

$a_t = \max_a Q^*(\phi(s_t),a;\theta)$
Why on the line 7 it's $a$

I'd say that in this case it is a sloppy math notation (or just typo) on the authors' part.
It should be argmax, not max. $$a_t = \arg \max_a Q^*(\phi(s_t),a;\theta)$$

Kostya
  • 2,667
  • 12
  • 24