Can non-differentiable layer be used in a neural network, if it's not learned?

Question

For example, AFAIK, the pooling layer in a CNN is not differentiable, but it can be used because it's not learning. Is it always true?

Neil Slater · Accepted Answer · 2019-01-08T20:12:18.537

It is not possible to backpropagate gradients through a layer with non-differentiable functions. However, the pooling layer function is differentiable*, and usually trivially so.

For example:

If an average pooling layer has inputs $z$ and outputs $a$, and each output is average of 4 inputs then $\frac{da}{dz} = 0.25$ (if pooling layers overlap it gets a little more complicated, but you just add things up where they overlap).
A max pooling layer has $\frac{da}{dz} = 1$ for the maximum z, and $\frac{da}{dz} = 0$ for all others.

A pooling layer usually has no learnable parameters, but if you know the gradient of a function at its outputs, you can assign gradient correctly to its inputs using the chain rule. That is essentially all that back propagation is, the chain rule applied to the functions of a neural network.

To answer your question more directly:

Can non-differentiable layer be used in a neural network, if it's not learned?

No.

There is one exception: If this layer appears directly after the input, then as it has no parameters to learn, and you generally do not care about the gradient of the input data, so you can have a non-differentiable function there. However, this is just the same as transforming your input data in some non-differentiable way, and training the NN with that transformed data instead.

* Technically there are some discontinuities in the gradient of a max function (where any two inputs are equal). However, this is not a problem in practice, as the gradients are well behaved close to these values. When you can safely do this or not is probably the topic of another question.

Can non-differentiable layer be used in a neural network, if it's not learned?

1 Answers1