High dimensionality - overfitting relationship

Question

I understand both why high dimensionality and overfitting are undesired but recently I came up multiple sources mentioning that

High-dimensional data often leads to overfitting ([example][1])

But as far as I understood when more features are being measured and considered I would need much more data to train a model per the curse of dimensionality. This means that when more features are involved it's much more likely that my model is underfitting and learning just noise and doesn't have enough data to find a meaningful pattern in my data which is able to generalize.

Can somebody clarify? [1]: https://vtiya.medium.com/the-relationship-between-high-dimensionality-and-overfitting-5bca0967b60f

score 2 · Answer 1 · answered May 13 '25 at 11:09

These are 2 different points, which however point to the same direction:

The more the features, the easier it is to overfit
High dimension leads to overfitting

The "I need more data to train..." is exactly pointing to (1): you need more data to train your model, to avoid it overfitting on the small sample you have.

I'd encourage you to consider the linear regression case... If you have N features, if you have less than N samples, even linear regression overfits (thus, you need more data so that your model does not overfit)

High dimensionality - overfitting relationship

1 Answers1