Why are some of the weights not initialized from the pretrained model checkpoint (from hugging face)?

Question

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Maltehb/danish-bert-botxo")
model = AutoModelForSequenceClassification.from_pretrained("Maltehb/danish-bert-botxo")

# Text to classify
text = "Det er en god dag"

# Tokenize input text
inputs = tokenizer(text, return_tensors="pt")

# Forward pass through the model
outputs = model(**inputs)

# Get predicted probabilities for each class
probs = torch.softmax(outputs.logits, dim=1).detach().numpy()

print(probs)
# Predicted label
predicted_label = "positive" if probs[0][1] > probs[0][0] else "negative"

print("positive prob:", probs[0][1])
print("negative prob:", probs[0][0])
print(f"The sentiment of the text '{text}' is {predicted_label}.")

output:

$ python temp.py 
Some weights of BertForSequenceClassification were not init
ialized from the model checkpoint at Maltehb/danish-bert-bo
txo and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.        
[[0.43838173 0.5616182 ]]
positive prob: 0.5616182
negative prob: 0.43838173
The sentiment of the text 'Det er en god dag' is positive.

I am surprised to read that Some weights of BertForSequenceClassification were not init ialized from the model checkpoint at Maltehb/danish-bert-bo txo and are newly initialized: ['classifier.bias', 'classifier.weight'].

If I understood it correctly, Maltehb/danish-bert-bo txo is an already trained model, so I shouldn't have to train it again. So why were some of the weights not initialized from the model checkpoint?

Lelouch · Answer 1 · 2024-03-24T15:53:41.330

This is because you are using a BertForSequenceClassification while the model you are using should be a BertForPreTraining.

The difference is that BertForSequenceClassification has a classifier head while the BertForPreTraining has not. Hence even if the Bert backbone use the learned weights from "danish-bert-bo", there is no weight for the classifier head in there so the classifier baias and weight remained as initialized randomly.

So replace AutoModelForSequenceClassification by AutoModelForPreTraining

Why are some of the weights not initialized from the pretrained model checkpoint (from hugging face)?

1 Answers1