If we are incapable of overfitting when using the boosting method, then what would be the point of performing feature selection?

Created by butcheer
Hi Eric. There are several reasons. First, the reduction in number of features is less computationally intensive and this will be an issue for most if you are running on a laptop. Second, we still want the most informative features and some of the learners might be fitting the latter errors in the training to features in the noise. Third, kind of relates to the second one, but keep in mind that the weights are being distributed across all features based on the errors made in the previous iteration of the model, so distributing the weights over many features can be less subject to noise and it will likely take longer to converge to the optimal fits. The last one is my intuition and would be an interesting thing to test. That is, boost using 10000 features, then 1000 features, then 100, then 10. See which model converges quickest.

Overfitting Boosting Ensemble Method page is loading…