Can fully connected layers be used for feature detection?

Question

I need help in understanding something basic.

In this video, Andrew Ng says, essentially, that convolutional layers are better than fully connected (FC) layers because they use fewer parameters.

But I'm having trouble seeing when FC layers would/could ever be used for what convolutional layers are used for, specifically, feature detection.

I always read that FC layers are used in the final, classification stages of a CNN, but could they ever be used for the feature detection part?

In other words, is it even possible for a "feature" to be deciphered when the filter size is the same as the entire image?

If not, it's hard for me to understand Andrew Ng's comparison---there aren't any parameter reduction "savings" if we're not going to use an FC "filter" in place of a CNN layer in the first place.

A semi-related question: Can multi-layer perceptrons (which I understand to be fully connected neural networks) be used for feature detection? If so, how do image-sized "filters" make any sense?

score 1 · Answer 1 · edited Jun 29 '20 at 23:42

First of all, (FC) Neural Networks (NN) are universal function approximators. This means, that, in theory, there must exist some NN of appropriate size that is capable of doing what a CNN can do as well. The only difference is that the standard NN would have to be considerably larger than a corresponding CNN with the same capabilities. Each input pixel would probably have to map to a few dozens to hundreds of input nodes that are trained each on detecting some part of a distinct feature in their respective input image location. Consecutive, hidden layers than have to combine the parts extracted by the input layer for any given location and finally act as the true feature detectors, detecting features as compositions of values in the first layer.

So, for each position in the input image separately, all the filters have to be learned separately. This absence of weight sharing (since filters are not shared across input locations, but have to be trained for each separate input location) makes using a standard NN for feature extraction so inefficient.

However, in theory, also a standard NN could be trained on solving an image classification task, thereby learning to extract features. Actually, I would not be surprised if someone had tested that already on the MNIST dataset.

Can fully connected layers be used for feature detection?

1 Answers1