Chapter 1.2.1.6 Maximum likelihood estimation of Probabilistic Machine Learning: An Introduction by Kevin P. Murphy says the following:
When fitting probabilistic models, it is common to use the negative log probability as our loss function: $$\mathcal{l}(y, f(\mathbf{x}; \mathbf{\theta})) = -\log p(y \mid f(\mathbf{x} ; \mathbf{\theta})) \tag{1.13}$$ The reasons for this are explained in Section 5.1.6.1, but intuition is that a good model (with low loss) is one that assigns a high probability to the true output $y$ for each corresponding input $\mathbf{x}$.
How does "a good model (with low loss) is one that assigns a high probability to the true output $y$ for each corresponding input $\mathbf{x}$" relate to $\mathcal{l}(y, f(\mathbf{x}; \mathbf{\theta})) = -\log p(y \mid f(\mathbf{x} ; \mathbf{\theta}))$?