In the MuZero Paper in the Appendix F they explain that they represent values and rewards as vectors.

This means that the neural networks don't output the scalars directly, instead, they output a probability distribution that later gets converted back to a scalar.
I wonder why it's done this way. Let's say they want to support a reward/value range of [-60000, 60000]. They could have the network output a scalar y and then do tanh(y)*60000 or even output the actual reward or value directly.
What's the advantage of representing Scalars as Vectors?