Why and when do we use ReLU over tanh activation function?

Question

I was reading LeCun Efficient Backprop and the author repeated stressed the importance of average the input patterns at 0 and thus justified the usage of tanh sigmoid. But if tanh is good then how come ReLU is very popular in most NNs (which is even more odd when the authors didn't mention about ReLU at all)

score 2 · Answer 1 · answered Jul 06 '22 at 08:11

For a discussion about the advantages of ReLU, see the original paper by Glorot (2011) "Deep sparse rectifier neural networks". "Efficient Backprop" is a 1998 paper. At the time the use of rectifiers was uncommon and sigmoid was the standard choice of activation.

Why and when do we use ReLU over tanh activation function?

1 Answers1