3

The sigmoid, tanh, and ReLU are popular and useful activation functions in the literature.

The following excerpt taken from p4 of Neural Networks and Neural Language Models says that tanh has a couple of interesting properties.

For example, the tanh function has the nice properties of being smoothly differentiable and mapping outlier values toward the mean.

A function is said to be differentiable if it is differentiable at every point in the domain of function. The domain of tanh is $\mathbb{R}$ and $ \dfrac{e^x-e^{-x}}{e^x+e^{-x}}$ is differentiable in $\mathbb{R}$.

But what is meant by "smoothly differentiable" in the case of tanh activation function?

hanugm
  • 4,102
  • 3
  • 29
  • 63

1 Answers1

4

A smooth function is usually defined to be a function that is $n$-times continuously differentiable, which means that $f$, $f'$, $\dots$, $f^{(n - 1)}$ are all differentiable and $f^{(n)}$ is continuous. Such functions are also called $C^n$ functions.

It can be a bit of a vague term; some people might even stretch the definition and say any continuous function is smooth (though I'd be a little surprised if I saw that in use, personally). Other people write smooth to mean infinitely differentiable: for example $f(x) = e^x$ can be differentiated as many times as you like.

I guess what the author is trying to point out is that the ReLU rectifier function isn't differentiable. Even if you use the "trick"1 of treating ReLU as differentiable everywhere, you would still get a derivative that is discontinuous:

$$\mathrm{ReLU}'(x) = \begin{cases} 1 & x \ge 0 \\ 0 & \text{otherwise.} \end{cases}$$

So, it's fair to say that ReLU isn't smooth in the same sense of the $\tanh$ function, which has a continuous derivative (and, in fact, you could carry on and consider the higher derivatives).


1 If this doesn't sound familiar, see p. 188 of Deep Learning by Bengio et al. We can get around the fact that ReLU functions aren't differentiable at zero by just pretending it has a well-defined derivative of zero or one. A little dishonest, perhaps, but it works very well.

htl
  • 1,020
  • 1
  • 6
  • 13