5

I trained a simple model to recognize handwritten numbers from the mnist dataset. Here it is:

model = Sequential([
    Conv2D(filters=1, kernel_size=(3,1), padding='valid', strides=1, input_shape=(28, 28, 1)),
    Flatten(),
    Dense(10, activation='softmax')])

I experimented with varying the number of filters for the convolutional layer, while keeping other parameters constant(learning rate=0.0001, number of episodes=2000, training batch size=512). I used 1, 2, 4, 8, and 16 filters, and the model accuracy was 92-93% for each of them.

From my understanding, during the training the filters may learn to recognize various types of edges in the image (e.g, vertical, horizontal, round). This experiment made me wonder whether any of the filters end up being duplicate -- having the same or similar weights. Is there anything that prevents them from that?

mark mark
  • 813
  • 6
  • 25

1 Answers1

6

No, nothing really prevents the weights from being different. In practice though they end up almost always different because it makes the model more expressive (i.e. more powerful), so gradient descent learns to do that. If a model has $n$ features, but 2 of them are the same, then the model effectively has $n-1$ features, which is a less expressive model than that of $n$ features, and therefore usually has a larger loss function.

But even if the weights are different, some of them can be very similar. If you visualize the first layer of your convolution filters, and you have a large number of them (e.g. 100), you will see some of them are learning to detect roughly the same edges (same orientation and placement). These features are so similar, they are effectively redundant in the model and do not add to its predictive power.

There's actually an entire field of research on identifying redundant features and pruning them. Le Cun shows in Optimal Brain Damage that pruning out redundant features not only make the model smaller and faster for inference, but can also even improve the model's accuracy.

Here is a blog post for a high level overview of one of the pruning methods for more info.

user3667125
  • 1,700
  • 9
  • 16