Active Learning regression with Random Forest

Asked Aug 03 '22 at 17:03

Active Aug 03 '22 at 17:03

Viewed 244 times

I have a dataset of about 8k points and I am trying to employ active learning with the random forest regressor. I have split the dataset to train and test with train being around 20 points. The test serves as the unlabelled pool (although I have the labels).

My workflow is the following:

Select a budget c.
Train the RF on train.
Select the sample from the test for which the predictions have the greatest variance.
Train the RF on train+sample

and the process continues until there is no more budget available. At each retrain I am calculating the accuracy on the test with the coefficient of determination.

Is the above workflow valid? What I have observed is that accuracy isn't improved compared to random sampling. Is there any other query strategy that can work with Random Forests for regression?

I could have used Gaussian processes but from my experience they need a lot of tuning and for large training sets, training time is very large. That is the reason I selected Random Forest.

asked Aug 03 '22 at 17:03

Antonios Sarikas

Active Learning regression with Random Forest

0 Answers0