I have a dataset with 2,23,586 samples out of which i used 60% for training and 40% for testing. I used 5 classifiers individually, SVM, LR, decision tree, random forest and boosted decision trees. SVM and LR performed well with close to 0.9 accuracy and recall also 0.9 but tree based classifiers reported an accuracy of 0.6. After a careful observation, I found out that SVM and LR did not predict the labels of 20,357 samples identically. So Can I apply voting and resolve this conflict wrt prediction outcome? Can this conflict be due to an imbalanced dataset?
Asked
Active
Viewed 85 times
1 Answers
1
Yes, you can. There are a lot of different techniques, usually called Ensemble Methods.
A better approach might be to use something like AdaBoost along with a cheaper method like the decision trees you looked at. AdaBoost explicitly tries to train classifiers to correctly handle different parts of the data, rather than hoping that different methods turn out to do so by chance.
John Doucette
- 9,452
- 1
- 19
- 52