I'd like to ask you how do we know that neural networks start by learning small, basic features or "parts" of the data and then use them to build up more complex features as we go through the layers. I've heard this a lot and seen it on videos like this one of 3Blue1Brown on neural networks for digit recognition. It says that in the first layer the neurons learn and detect small edges and then the neurons of the second layer get to know more complex patterns like circles... But I can't figure out based on pure maths how it's possible.
2 Answers
We do it experimentally; you're able to look at what each layer is learning by tweaking various values throughout the network and doing gradient ascent. For more detail, watch this lecture: https://www.youtube.com/watch?v=6wcs6szJWMY&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=12 it provides many methods used for understanding exactly what your model is doing at a certain layer, and what features it has learnt.
- 1,446
- 10
- 21
The network architecture is relevant to this question.
Convolutional neural network architectures enforce the building up of features because the neurons in earlier layers have access to a small number of input pixels. Neurons in deeper layers are connected (indirectly) to more and more pixels, so it makes sense that they identify larger and larger features. Lots of the visual examples available online which show, for example, a curve, to a circle, to a part of an animal, to a whole animal, are based on convolutional networks. The beautiful examples from the Harvard lecture in the other answer use convolutional networks.
With that being said, increasing complexity with each layer is true generally, including for dense architectures like the 3Blue1Brown one. It's just that this is a more abstract 'increase in nonlinearity' rather than spatial feature size. Depending on the task the network is learning, earlier layers will be more 'basic', but their neurons might use large areas of the input.
- 103
- 5