7

In image classification we are generally told the main reason of using CNN's is that densely connected NN's cannot handle so many parameters (10 ^ 6 for a 1000 * 1000 image). My question is, is there any other reason why CNN's are used over DNN's (densely connected NN)?

Basically if we have infinite resources will DNN trump CNN's or are CNN's inherently well suited for image classification as RNN's are for speech. Answers based either on mathematics or experience on the field is appreciated.

3 Answers3

2

The keyword here is Parameter Sharing or Weight Sharing across various image portions.

If we take a simple example of grayscale binary image of an alphabet 'F', it is a combination of multiple patterns. The patterns here are vertical lines and horizontal lines. These patterns are based on relation between intensities of contiguous cells. This relation between contiguous cells is established using a weight matrix.

Also, for identifying multiple horizontal lines, we dont need multiple node-sets in hidden dense layer trying to identify different horizontal lines in the image. The pattern is same but present in different locations. Hence the sharing of weights comes into picture.

In the 1st hidden layer, encode the pattern horizontal line in a weight matrix(learnt during training and used in testing). Place it over small grid and check for presence. As this matrix is slided and tested across the image, the presence of horizontal lines is marked in various locations. This weight matrix is called a kernel.

Combining the above points, kernel provides a way of handling parameter / weight sharing between contiguous cells to identify patterns. Dense layer instead of kernels would solve it eventually but start in a random manner. Since a efficient way was identified, it is being used.

Next to identify vertical lines, another kernel needed and slide across.

Suppose next we have dense layer as 2nd hidden layer. This layer looks for combination of patterns ('p' horizontal lines and 'q' vertical lines in this case for 'F') present and learns combinations to identify output.

Just to compare with traditional programming, kernels are like regular expressions. dense layers are like loops. Just sharing my thoughts. Any better explanation is welcome.

solver149
  • 121
  • 4
2

Convolution Neural Networks can detect more of the spatial features compared to Densely Connected Network. Consider this in any given real world image the pixel values of neighboring cells to do not vary highly, But when this image are passed to a Densely Connected Neural network for training the spatial relations between neighboring pixels is lost as all other cells can heavily influence the training whereas in Convolutional networks due to operation of convolution by local information is preserved, it is called local connectivity.

Pradeep BV
  • 151
  • 1
  • 7
0

That is not the actual reason , "convolution" layers are inspired by cells in visual-system. This is derived from the work of hubel-wiesel. for more information check hubel-wiesel experiment.