2

I'm fine-tuning two different CNNs for an image classification task:

  • The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet.
  • I use the same classification head for both models: a dense layer with 1024 neurons, followed by another dense layer with 1024 neurons, then a Dropout(0.3) layer, and finally a softmax layer with 2 output neurons (for 2 classes).
  • I use the same dataset(~500 images).
  • No image augmentation (for experimentation purposes).
  • I train both models for the same number of epochs (20 epochs).
  • I use the same optimizer and learning rate (5e-6).
  • I freeze the entire backbone and only train the head.
  • The code is written in TensorFlow (tf.keras).

However, I noticed something odd: The ResNet101 model barely reaches 55% accuracy, while the MobileNetV2 model achieves around 90% accuracy under the exact same setup. Since ResNet101 is deeper and generally more powerful than MobileNetV2, I expected it to perform better, but the opposite happened.

My questions are:

1- Why does ResNet101 perform so poorly compared to MobileNetV2 in this setup?

2- Are there specific considerations when fine-tuning deeper networks like ResNet101 compared to lighter models like MobileNetV2?

S.E.K.
  • 41
  • 1
  • 5

0 Answers0