In the decision tree algorithm, why do we use a weighted average of child entropies when we calculate information gain? What is wrong about using the arithmetic mean of entropies?
Asked
Active
Viewed 120 times