What are the differences between loss surfaces that "derive"from different observations?

Question

If I understand right that each observation whithin a dataset, creates a different loss surface where we want to find the global minimum.

How different those surfaces one from another? Would it be correct to say that they differ like (for example) those two parabolas: f(x) = 5x^2 +4x+2 versus f(x) = 5x^2 +1x+8 which can be seen as same parabola located in another place of xy plane.

Thank you

score 2 · Accepted Answer · answered Aug 10 '23 at 11:47

Let's take in consideration linear regression. You have a dataset composed by $x,y$ pairs, and you assume they are linearly related, thus you model this problem with LR: $$ y = wx+b $$ Now, you want to find the $w$ and $b$ that best describe your data, thus you set a loss function, say MSE, and you minimize it: $$ L(w, b) = \sum_{(x,y)\in D} (y - (wx+b))^2 $$

As you can see if you consider a single sample, this is a parabola. However, the fact that is a parabola, it's given by the loss function, and your model

At that point, you try to minimize the average loss (the $1/n$ is discarded because it does not effect the minimization), so say you take "the average parabola across the dataset"

However, you can clearly see that there is no relation between points in your dataset, as they are assumed to be independent, thus their loss function can be as far as they want (they are still parabolas, but very far from each other)

What are the differences between loss surfaces that "derive"from different observations?

1 Answers1