What is the mathematical definition of "multinomial model" in machine learning? I will be happy for a good definition plus an example.
2 Answers
A multinomial model is the statistical terminology for counting n repeated draws according to a softmax classifier (regressor to be precise).
Update after @David comment: Categorical and multinomial distribution are used interchangeably ( technically incorrect though), as often in practice what is unknown is the probability vector p, but not the sample size n. Also for ML models at inference time n is usually known (e.g. n=1 for predicting the next sample). So the learning problem is to map features X to a probability vector p over K categories, e.g. through a softmax last layer.
This is analogous to how when dealing with binomial outcomes (counts), practitioners refer to binomial vs Bernoulli problem setup interchangeably as also here often n is known / fixed, whereas the probability of p(y=1|X) is unknown and has to be learned through e.g. a sigmoid (logistic) regression setup.
See also https://discourse.mc-stan.org/t/multilevel-categorical-multinomial-model-terms-and-priors/5294
https://beginningwithml.wordpress.com/2018/06/22/3-4-softmax-regression/
- 111
- 2
The multinomial model generalizing the binomial model is a probabilistic model primarily used for categorical data where each observation belongs to one of $k>2$ discrete categories, which can be applied in tasks like text classification, topic modeling. A common such model before the era of LLMs is the Multinomial Naive Bayes classifier (MNB) where each document represented as a $k$-dimensional Bag of Words (BOW) feature is assumed to follow the Multinomial distribution, each observed word is a trial, the vocabulary corresponds to the $k$ possible categories, and the conditional probabilities $p_i$ of each observed word given a class label are mutually independent. Thus it's better at capturing term frequency information compared to other models such as Bernoulli model where only presence or absence of words are taken into account, which can be critical for classification accuracy.
Suppose we want to classify an email with BOW representation whose vocabulary is ["win", "money", "free"] (only 3 discrete categories for any word observation) and classes labels are Spam and Not-Spam. Next from training data we know the estimated conditional probabilities of each word in the vocabulary given email class, say, $$P(win∣Spam)=0.3,P(money∣Spam)=0.5,P(free∣Spam)=0.2$$ $$P(win∣Not-Spam)=0.1,P(money∣Not-Spam)=0.2,P(free∣Not-Spam)=0.7$$
Now we test an incoming email as BOW feature $X=(win: 2, money: 1, free: 1)$. First we use the Multinomial distribution formula to compute likelihood for each class label as follow: $$P(X∣Spam)=\frac{4!}{2!1!1!}(0.3)^2(0.5)^1(0.2)^1$$ $$P(X∣Not-Spam)=\frac{4!}{2!1!1!}(0.1)^2(0.2)^1(0.7)^1$$
Finally we combine above computed likelihoods with class priors to determine which class the email most likely belongs to according to the Bayes theorem.
Since MNB is a very popular text classification ML model, sklearn has a MultinomialNB class in its naive_bayes module. You can try below sample Multinomial model.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Expanded example dataset (add more rows to increase model training potential)
data = {
'text': [
'Win a free iPhone now',
'Congratulations, you won a prize',
'Meeting at 3 PM today',
'Lunch at the usual spot',
'Claim your free gift now',
'Free money for you today',
'Get your prize now',
'This is not spam, just a meeting reminder',
'Check out the new product',
'Free vacation offer, click here'
],
'label': ['spam', 'spam', 'not-spam', 'not-spam', 'spam', 'spam', 'spam', 'not-spam', 'not-spam', 'spam']
}
df = pd.DataFrame(data)
df['label'] = df['label'].map({'not-spam': 0, 'spam': 1})
Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
df['text'], df['label'], test_size=0.2, random_state=42
)
Convert text to a Bag-of-Words representation
vectorizer = CountVectorizer()
X_train_bow = vectorizer.fit_transform(X_train)
X_test_bow = vectorizer.transform(X_test)
Initialize and train the Multinomial Naive Bayes model
model = MultinomialNB(class_prior=[0.5, 0.5])
model.fit(X_train_bow, y_train)
Make predictions on the test set
y_pred = model.predict(X_test_bow)
Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Example new messages
new_messages = [
'Win cash prizes now',
'Are we meeting tomorrow?'
]
Vectorize new messages
new_messages_bow = vectorizer.transform(new_messages)
Predict
predictions = model.predict(new_messages_bow)
Display predictions for new messages
for msg, pred in zip(new_messages, predictions):
label = 'Spam' if pred == 1 else 'Not-Spam'
print(f"Message: '{msg}' => Prediction: {label}")
Output is expected as:
Accuracy: 1.0
Confusion Matrix:
[[1 0]
[0 1]]
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 1
1 1.00 1.00 1.00 1
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
Message: 'Win cash prizes now' => Prediction: Spam
Message: 'Are we meeting tomorrow?' => Prediction: Not-Spam
- 11,000
- 3
- 8
- 17