Do we ever need more then 1 hidden layer in a binary classification problem with ANNs? If yes why?

Question

I have read about the universal approximation theorem. So, why do we need more than 1 layer? Is it somehow computationally efficient to add layers instead of more neurons in the hidden layer?

David Hoelzer · Answer 1 · 2021-05-22T10:35:11.137

This is akin to asking "Why do we need more than one instance of sine to represent any repeating function" or "why can't we represent any polynomial with an equivalent polynomial of just the first degree?" There are many, many problems... I'd even want to say most... that will require more than one layer to solve because the higher dimensional relationships cannot be well represented by just one layer. This is not to say that the theorem is wrong, but consider the applied aspects. We can approximate any continuous function, but that might require a single layer that is infinitely wide, however that same function might be approximated by a deep network having only a few dozen neurons.

However, this is not to say that many networks could not be represented with networks that perform at least as well, or perhaps even better, by simpler networks of fewer layers/neurons. There is active research into how to generalize this.

Ultimately, non-trivial problems often require an empirical approach in this space currently because there is no general solution to "learning."

Do we ever need more then 1 hidden layer in a binary classification problem with ANNs? If yes why?

1 Answers1