Why use a neural network approach for linear regression if there is already an explicit solution

Question

I have noticed that many introductory materials on neural networks use linear regression to predict house prices as the standard first example on neural networks. This seems to be a common practice.

But why use neural networks for a problem that already has an explicit solution, such as linear regression using the least squares criterion?

Linear regression has a well-defined formula for the best-fitting line (https://en.wikipedia.org/wiki/Simple_linear_regression#Example), so what is the benefit of applying neural networks in this case?

score 3 · Accepted Answer · answered Dec 23 '24 at 15:59

The explicit solution, the famous $\hat\beta_{OLS} = (X^TX)^{-1}X^Ty$ in statistics, is a piece of linear algebra that says what the best fit is, given a certain model. There is no commentary, however, about how good that fit is. The relationship between variables and the outcome could be curvy (nonlinear); there could be interactions between the variables, that the impact one variables has on the outcome depends on the values of other variables; the interactions could be curvy (interactions between nonlinear transformations of the variables). If you just stick a straight line (hyperplane for multiple linear regression) through the data, the fit might be quite poor and able to be improved upon.

There is an interesting paper out there that neural networks are just polynomial regressions, which is a particular form of linear regression that can be solved with the matrix solution, $\hat\beta_{OLS} = (X^TX)^{-1}X^Ty$. However, that requires explicit engineering of the polynomial features: which relationships are curved, which variables interact, and which interactions involve curvature. For better or for worse, neural networks do not require this. You just throw a bunch of neurons at the problem, with practitioners varying in how thoughtful they are about the exact architecture, and let the network figure out the relationship.

Being explicit in answering your question, neural networks offer a possible improvement in performance compared to more basic forms of linear regression (without interactions or nonlinear transformations of features), due to their high capacity for flexible modeling that the rigidity of hyperplanes do not allow. If you start allowing for complicated linear models involving interactions and curvature in the features, then linear regression could be more competitive, as Cheng et al (2018) argue.

REFERENCE

Cheng, Xi, et al. "Polynomial regression as an alternative to neural nets." arXiv preprint arXiv:1806.06850 (2018).

score 2 · Answer 2 · answered Dec 21 '24 at 12:15

I know that when I teach students about fully connected layers, I always introduce the topic with linear regression. There are a few reasons:

The general function applied in each dense neuron is $\sigma(wx+b)$. While the nonlinearity introduced by whichever activation is chosen for $\sigma$ makes this nonlinear, the heart of the learning is the parameters $w$ and $b$, or $\theta_0$ and $\theta_1$. This provides an excellent, simplified example to work through the gradient calculation using something like $MSE$.
The use of the linear regression also provides an excellent opportunity to reinforce the Matrix Composition Theorem. I.e., any series of linear or affine transformations must collapse to a single linear transformation matrix. This helps to develop the intuition for why a nonlinear activation is necessary.

With these main points introduced and understood, it's easier for learners to grasp the "why" of the design and the functioning of a dense layer without needing to rigorously work through an example by hand.

score 1 · Answer 3 · answered Dec 22 '24 at 11:39

It's just evidence that sometimes parallelisms are taken too far. NN and LR don't really share much...

The only grounded parallelism between NN and LR is interpreting the first N+1 layers of the NN as a learned preprocessing of the input.

The fact that LR has a closed form is more of a luck than a rule. You are interested in a statistic that is not the mean (such as the median)? Bye Bye closed form. Are you interested in constrained weight space? Bye Bye closed form.

Why use a neural network approach for linear regression if there is already an explicit solution

3 Answers3