2

All features of my input dataset, which is going to be used for training a simple multi-layered neural network, are in range $[-1,+1]$ and the output of $NN$ is a single number again in range $[-1,+1]$.

Is it a requirement/recommendation to normalize my inputs into $[0,1]$ range? Or can I feed the first layer (input) straight from the $[-1,+1]$ values?

Also, should I initialize weights and biases to $0$, or to random values with standard distribution?

Chait
  • 107
  • 5
Bikay
  • 23
  • 5

1 Answers1

3

Generally, between -1 and 1 are ideal, though you can get away with a wider range. For example, using the z-score as the range, you will be outside of this range, sometimes by quite a bit (say, -30 for significantly distant outliers in some cases I've seen), but even then you will be fine.

Anecdotally, you might try training a toy problem like MNIST with and without feature scaling. You might be surprised that even the range of 0 to 255 will be just fine.

When you start dealing with really large numbers is when you start running into problems. The learning rate will simply not be large enough to move the weights far enough to learn in any reasonable amount of time.

David Hoelzer
  • 1,198
  • 11
  • 21