0

there is one problem which bugs me quite a long time, it is the non-convex loss shape (multiple minima, e.g. shown here) of neural networks which use a quadratic loss function.

Question: Why is a “common” AI problem usually non-convex with multiple minima, although we are using e.g. quadratic loss functions (which is in lectures usually drawn as a simple,convex,quadratic function such as x^2)?

My guess:

Is it because we are feeding the loss function with our highly non-linear model-output and therefore the resulting total loss surface is highly non-linear/ non-convex? Specifically, the quadratic loss is just approximating the infinitesimal small neighbourhood around a specific point (minibatch) as quadratic. Is that guess correct? This would imply, that highly-non linear / very deep and complex models have a highly-non linear resulting loss-surface, while shallower models have less minima and a one-layer network has a convex shape ?

horsti
  • 3
  • 2

1 Answers1

1

If I understood your question correctly - the quadratic loss function is not always convex. Its' convexity depends on the input it takes. For example, consider a very basic NN that takes some input $x$ and returns \begin{equation} f(x)=relu(w_1relu(w_0x+b_0)+b_1). \end{equation} Using quadratic loss, or $\ell_2$ is minimizing the objective $\sum_i(x_i-f(x_i))^2$, but even when examining only one of the terms (say $i=0$), which is $(x_0-f(x_0))^2$, we see that it is not convex at all, as the following plot shows:

                                          plot

Hadar Sharvit
  • 381
  • 1
  • 13