4

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:

$$\frac{\varepsilon}{|\mathcal{A}(s)|} \sum_{a} Q^{\pi}(s, a)+(1-\varepsilon) \max _{a} Q^{\pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $\epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn't get the notation, etc. I'd be more than glad if I can be pointed towards a better source where this is detailed.

nbro
  • 42,615
  • 12
  • 119
  • 217
Metrician
  • 195
  • 5

1 Answers1

6

This expression: $|\mathcal{A}(s)|$ means

  • $|\quad|$ the size of

  • $\mathcal{A}(s)$ the set of actions in state $s$

or more simply the number of actions allowed in the state.

This makes sense in the given formula because $\frac{\epsilon}{|\mathcal{A}(s)|}$ is then the probability of taking each exploratory action in an $\epsilon$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66