In LLMs today, temperature is implemented as a softmax function at the end of the neural network.
In physics, temperature increases the motion of atoms, like Brownian motion. In brownian motion, every particle undergoes small fluctuations.
I wonder whether this has been implemented in neural networks: Instead of the softmax function, every parameter could be slightly randomized, in order to provide a different output. I understand that this is probably not practical in large LLMs like ChatGPT, because it would require billions of random numbers and billions of parameter modifications. Also, for quantized parameters, the random changes would either be too high or not affect the quantized value.
Has this been done before? Was temperature implemented as Brownian motion, e.g. before we switched to softmax?