5

In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using:

  1. Feature Pyramid Networks (as the architecture)
  2. EfficientNet-B2 (as the backbone)

Performance Measures on the Validation Set. The RF model that only inputs data from the visible Landsat 8 bands achieved the lowest performance on the validation set, but the incorporation of auxiliary predictors substantially improved its performance. All of the CNN models outperformed the RF models. The best performing model, which we call ForestNet, used an FPN architecture with an EfficientNet-B2 backbone. The use of SDA provided large performance gains on the validation set, and land cover pre-training and incorporating auxiliary predictors each led to additional performance improvements.

What's the difference between architectures and backbones? I can't find much online. Specifically, what are their respective purposes? From a high-level perspective, what would integrating the two look like?

hanugm
  • 4,102
  • 3
  • 29
  • 63

3 Answers3

0

The vocabulary is definitely non-standard and a bit confusing, but Feature Pyramid Networks is used as a feature extractor, and its output is then fed into EfficientNet-B2 to be used to classify the image. One neural network model is concatenated at the end of the other.

So it seems like "architecture" is the front half of the neural network model which takes as input the satellite image and extracts image features, and then is directly connected to the back half of the model (hence "backbone"), which takes the features extracted from the "architecture" and makes a classification.

This terminology is definitely non-standard here, at least in the AI community, and if you ask anyone here I think it will be uncommon for them to naturally think about the words "architecture" vs "backbone" in this way unless they specialize in a similar field to the authors.

hanugm
  • 4,102
  • 3
  • 29
  • 63
user3667125
  • 1,700
  • 9
  • 16
0

Architecture is the entire model.

Backbone is the first few hidden layers. This is containing low-level features which generalize the knowledge. Practically, it's commonly used in pretrain where this backbone weights are frozen (non-updatable). This is make sense since it's trained something general.

Another terms are Neck and Head.

Neck is the intermediary layers between Head and Backbone. It's commonly used in transition for multi-task learning.

While the Head is simply the output layer for certain task.

-1

I've taken an NVIDIA course on the portal and it said that ResNet, VGG, GoogleNet were model architectures , and that DetectNet_V2,FasterRCNN,SSD, UNET were model backbones, so I think it's a common terminology