I'm working with a fully connected neural network with input 32x32x3.
The architecture includes a dense layer 32 + ReLu activation, then another dense layer 64 + ReLu Activation, followed by a dense layer 32 + ReLu Activation, and finally, a dense layer of 10 neurons with softmax activation
I have a homework question:
If we had to choose between Uniform(0,1) and Uniform(-1,0), which one would you expect to work best and why?
After searching the internet and using ChatGPT, I have reached different conclusions that are not consistent:
1- All my inputs are positive, and if I choose Uniform(0,1), my z = W.X + bias will be >0, causing my ReLU function to be linear. Therefore, Uniform(0,1) is considered worse.
2- All my inputs are positive, and if I choose Uniform(-1,0), my z < 0, leading to most of my neurons being deactivated after using ReLU. Hence, Uniform(-1,0) is considered worse.
I don't know which one is correct, or if either of them is correct, and I would like help to clarify this doubt.