6

This is a topic I have been arguing about for some time now with my colleagues, maybe you could also voice your opinion about it.

Artificial neural networks use random weight initialization within a certain value range. These random parameters are derived from a pseudorandom number generator (Gaussian etc.) and they have been sufficient so far.

With a proper sample simple, pseudorandom numbers can be statistically tested that they are in fact not true random numbers. With a huge neural network like GPT-3 with roughly 175 billion trainable parameters, I guess that if you would use the same statistical testing on the initial weights of GPT-3 you would also get a clear result that these parameters are pseudorandom.

With a model of this size, could in theory at least the repeatable structures of initial weights caused by their pseudorandomness affect the model fitting procedure in a way that the completed model would be affected (generalization or performance-wise)? In other words, could the quality of randomness affect the fitting of huge neural networks?

nbro
  • 42,615
  • 12
  • 119
  • 217
Aki Koivu
  • 61
  • 3

1 Answers1

1

I think theoretically yes, using a TRNG instead of a pseudo one obviously yields better values in anything that relies on randoms (if speed is not a concern).

However, I think there's a lot going on when training a NN for this to make a big impact. One aspect to consider is that the range of the initial random weights might be 'smoothed out' with initialization techniques such as Xavier (sigmoid, tahn) and He (ReLU).

Another important thing to consider is that randomness is not present only in weight initialization (source):

Randomness being present throughout the entire training process, and not only on weight initialization, should reduce the concern of not having true random numbers from the get go. Also, while using a hardware-based random number generator seems feasible to initialize weights, I think it would be too much of a costly I/O operation to happen in the middle of training.

talles
  • 146
  • 3