One Softmax or two separate logistic regressions for the task of classifying pictures as a/b and c/d

Question

Simply put, the question 11 in chapter 4 of Aurélien Géron's book "Hands-on Machine Learning" asks:

Suppose you want to classify pictures as outdoor/indoor and daytime/nighttime. Should you implement two logistic regression classifiers or one softmax regression classifier?

which he gives the answer in the jupyter notebook that accompanies the book (the given answer quoted below is exactly how he gave in the book and is complete):

If you want to classify pictures as outdoor/indoor and daytime/nighttime, since these are not exclusive classes (i.e., all four combinations are possible) you should train two Logistic Regression classifiers.

And I am not sure if I got it. My answer was direct: one softmax! I was surprised the the answer was different from that and I don't know if I understand it.

What I thought:

Isn't the idea of softmax regression: multiclass classification? When he says "all four combinations are possible" doesn't he means outdoor daytime, outdoor nighttime, indoor daytime, and indoor nighttime? If those four classes are possible and I want to classify pictures as one between those four, this is the task I expect to be more appropriate for a Softmax regression than two separate logistic regressions.

More than that, I think that Softmax regression allows for joint modeling instead of treating each classification task independently, which would be the case of using two logistic regressions. Considering all classes together and learning the relationships between the features can capture dependencies between then, right?

About the parameter estimation: isn't estimating a single set of parameters that map inputs to probabilities of each class more efficient than training two separate models?

Generalization: isn't sofmatx possibly better to generalize to unseen combinations of outdoor/indoor and daytime/nighttime since it learns a unified decision boundary?

Am I wrong? There are more things to consider or two logistic regressions separate will be always better than one softmax model?

score 3 · Accepted Answer · answered Apr 24 '24 at 19:56

Softmax gives you a probability distribution, thus its output are positive and sum to 1... if you want "outdoor" AND "daytime", you want a $1$ on the "outdoor" neuron and a $1$ on the "daytime" neuron... clearly, that's not possible if you use a single softmax, as the sum should add up to 2

What you can do, is to create a cartesian product of your output, and thus having an output like:

neuron 1: outdoor and daytime
neuron 2: outdoor and night time
neuron 3: indoor and daytime
neuron 4: indoor and night time

However, this solution works only because you are in a very small example, if instead you have 10 options for the first classification, and 10 for the second one, in the case of 2 softmax, you would have 20 neurons, instead with this "joint" representation, you would need 100...

score 2 · Answer 2 · answered Apr 24 '24 at 21:32

It is possible to have an output for every combination of independent classes. As another poster has noted this scales poorly. It is also possible to have a network which diverges into two sets of outputs, each softmax-ed among themselves. This reduces some overhead compared to two separate networks.

The question about whether either will generalize well is born out by the data. The first few hidden layers may perform some transformation which makes indoor/outdoor and daytime/nighttime separable and only decision boundaries need to be drawn by the final weights/activations. This is the best case scenario. The worst case scenario is that there are no common transformations which separate all of these classes and the divergent network begins at the input layer at which case one may as well have two independent networks.

One Softmax or two separate logistic regressions for the task of classifying pictures as a/b and c/d

2 Answers2