5

Why are the state-value and action-value functions are sometimes written in small letters and other times in capitals? For instance, why in the Q-learning algorithm (page 131 of Barto and Sutton's book but not only), we the capitals are used $Q(S, A)$, while the Bellman equation it is $q(s,a)$?

nbro
  • 42,615
  • 12
  • 119
  • 217
d56
  • 243
  • 1
  • 7

2 Answers2

4

In the Sutton and Barto book $q(s,a)$ is used to denote the true expected value of taking action $a$ in state $s$, whereas capital $Q(s,a)$ is used to denote an estimate of $q(s,a)$. However, there is likely to be a lot of inconsistency in the literature as each author has their own preference on how to denote things. I would encourage you to consider whether the value you are reading is to denote an estimate or the true value.

David
  • 5,100
  • 1
  • 11
  • 33
1

Ordinary variables vs Random Variables

The difference is whether you're talking about a ordinary variable or a random variable.

For instance, the q-function (lowercase) is an expectation value (i.e. not a random variable), conditioned on a specific state-action pair: $$ q(s,a)\ =\ \mathbb{E}_t\left\{ R_t+\gamma R_{t+1} + \gamma^2R_{t+2}+\dots\,\Big|\, S_t=s, A_t=a \right\} $$ Then, in some case, some authors may abuse notation slightly by feeding in a random variable into the q-function, e.g. $q(S_t,a)$, $q(s,A_t)$ or even $q(S_t,A_t)$, thereby undoing some or all of the conditioning in the definition of the q-function as an expectation value.

Feeding a random variable into a function like the q-function results in an output that is a random variable in its own right. It is for this reason that some authors choose to give the function itself an uppercase letter as well.

My advice would be to think to yourself, is this a random variable? For the rest, I would interpret upper/lowercase as no more than a hint to the reader.

Kris
  • 171
  • 5