The perplexity of the $i^{th}$ token in the $k^{th}$ sequence is
$$ P_{ki} = \frac{1}{p(t_{ki})} $$
The perplexity aggregated for the $k^{th}$ sequence is then
$$ P_{k} = \left(\prod_{i=1}^N P_{ki}\right)^{1/N} \\ = \left(\prod_{i=1}^N \frac{1}{p(t_{ki})} \right)^{1/N} $$
which is the geometric mean of the perplexities of the tokens. This makes sense as we are essentially taking the multiplicative inverse of the probability that the model got the whole sequence correct.
Now my question is how to aggregate the perplexities of several sequences. It seems from various places, including the Hugging Face Tutorial, I see that the prescription is to take the arithmetic mean of the perplexities of sequences
$$ P = \frac{1}{m} \sum_{k=1}^m P_k $$
I am not quite understanding what it means to take the average of 1/probabilities. What is this actually capturing?