What's the advantage of log_softmax over softmax?

Question

Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem.

However, while I am learning the pytorch mnist tutorial, I'm confused that why the combination of the log_softmax as the output layer and the nll_loss(the negative log likelihood loss) as the loss function was used (L26 and L34).

I found that when log_softmax+nll_loss was used, the test accuracy was 99%, while when softmax+nll_loss was used, the test accuracy was 97%.

I'm confused that what's the advantage of log_softmax over softmax? How can we explain the performance gap between them? Is log_softmax+nll_loss always better than softmax+nll_loss?

score 2 · Answer 1 · answered May 01 '19 at 02:45

2

The short answer is yes, log_softmax + nll_loss will work better.

I don’t know the implementation details under the hood in PyTorch, but see the screenshot below from the documentation:

answered May 01 '19 at 02:45

Hanzy

519
3
11

What's the advantage of log_softmax over softmax?

1 Answers1