What does the term $|\mathcal{A}(s)|$ mean in the $\epsilon$-greedy policy?

Question

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:

$$\frac{\varepsilon}{|\mathcal{A}(s)|} \sum_{a} Q^{\pi}(s, a)+(1-\varepsilon) \max _{a} Q^{\pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $\epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn't get the notation, etc. I'd be more than glad if I can be pointed towards a better source where this is detailed.

Neil Slater · Accepted Answer · 2020-07-14T20:44:55.347

This expression: $|\mathcal{A}(s)|$ means

$|\quad|$ the size of
$\mathcal{A}(s)$ the set of actions in state $s$

or more simply the number of actions allowed in the state.

This makes sense in the given formula because $\frac{\epsilon}{|\mathcal{A}(s)|}$ is then the probability of taking each exploratory action in an $\epsilon$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

What does the term $|\mathcal{A}(s)|$ mean in the $\epsilon$-greedy policy?

1 Answers1