I'm building a model for facial expression recognition, and I want to use transfer learning. From what I understand, there are different steps to do it. The first is the feature extraction and the second is fine-tuning. I want to understand more about these two stages, and the difference between them. Must we use them simultaneously in the same training?
3 Answers
Typically, in transfer learning, you have 2-3 stages
Pre-training: pre-train some base model $M_\text{base}$ on some "general" dataset $A$; note that you may not necessarily need to train $M_\text{base}$, but it may already be available e.g. on the web. During this phase, we extract (general) features or learn representations of the data, which can "bootstrap" the learning task with your specific dataset
Training: You replace the last layers of $M_\text{base}$ (i.e. the classifier/regression part) with new layers to solve your task, then you might freeze the initial layers (e.g. the convolutional layers) that are assumed to contain the general extracted features that can also be useful for your task: let's call this model $M_\text{main}$; at this point, you train this partially frozen model $M_\text{main}$ with your dataset $B$.
Fine-tuning: after training, you could unfreeze some of the frozen layers in $M_\text{main}$, especially the ones closest to your new classifier, then train again
In all 3 stages, one could say that we're extracting features (because we're learning weights), but some people, I guess, will refer to the pre-training phase as the feature extraction phase. I think I've seen people call the training stage also the fine-tuning stage (and the previous version of this answer actually was referring to the training phase as the fine-tuning phase), but, in the end, these terms could be used inconsistently anyway, so the important thing is that you understand what's going on and keep context into account.
You can find more information about this topic here. Note that there may be other more sophisticated or simply different approaches to transfer learning.
- 42,615
- 12
- 119
- 217
The difference between the two approaches (feature extraction vs fine-tuning) is well explained here: Fine Tuning vs Joint Training vs Feature Extraction
Also, this paper evaluate the performance one can hope to achieve with 2 sequence models (ELMo and BERT) with each approach: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- 11
- 1
It has been a while since the question was asked, but I came up with this article. It helped me to understand the topic. From the article:
Feature-based methods involve using the intermediate representations or features from a pre-trained model as additional inputs to a task-specific model.
Fine-tuning, on the other hand, involves modifying and retraining a pre-trained model to adapt it to a specific task.
- 26
- 1