2

On this website https://scikit-learn.org/stable/modules/learning_curve.html, the authors are speaking about variance and bias and they give a simple example of how works in a linear model.

How can I determine the bias and variance of a random forest?

nbro
  • 42,615
  • 12
  • 119
  • 217
jennifer ruurs
  • 589
  • 2
  • 10

1 Answers1

1

To gain a good understanding of this, I recommend first reading about the trade-off between bias and variance in ML and AI methods.

A great article on this topic that I recommend as a light mathematical introduction is this: https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229

In short: Bias represents the models effort to generalize samples, as opposed to Variance that represents the models effort to conform to new data. A high bias, low variance model will thus look more like a straight(underfitted) line, while a low bias, high variance model will look jagged and all-over the place(overfitted).

In essence, you need to find a balance between the two to avoid both overfitting(high variance, low bias) and underfitting(high bias, low variance) for your specific application.

But how can I determine this for a model such as a Random Forrest classifier?

To determine your models bias and variance configuration(if either is too high/low), you can look at the models performance on the validation and test set. The very reason we divide our data into training-validation-test sets, is so that we can validate the models performance when it is presented with samples it has not seen during training.

Krrrl
  • 221
  • 1
  • 10