Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)

Question

I'm fine-tuning two different CNNs for an image classification task:

The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet.
I use the same classification head for both models: a dense layer with 1024 neurons, followed by another dense layer with 1024 neurons, then a Dropout(0.3) layer, and finally a softmax layer with 2 output neurons (for 2 classes).
I use the same dataset(~500 images).
No image augmentation (for experimentation purposes).
I train both models for the same number of epochs (20 epochs).
I use the same optimizer and learning rate (5e-6).
I freeze the entire backbone and only train the head.
The code is written in TensorFlow (tf.keras).

However, I noticed something odd: The ResNet101 model barely reaches 55% accuracy, while the MobileNetV2 model achieves around 90% accuracy under the exact same setup. Since ResNet101 is deeper and generally more powerful than MobileNetV2, I expected it to perform better, but the opposite happened.

My questions are:

1- Why does ResNet101 perform so poorly compared to MobileNetV2 in this setup?
2- Are there specific considerations when fine-tuning deeper networks like ResNet101 compared to lighter models like MobileNetV2?

Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)

0 Answers0