Boosting is another ensemble method where we use multiple trees to predict some outcome. Unlike random forests or bagging, though, boosting requires trees to be grown sequentially.
In the boosting algorithm, we fit a tree to predict some outcome, . We then take the residuals and fit another tree to predict those residuals. We repeat this process — continuing to predict residuals — until we fit trees, where is a user-specified hyperparameter.
Boosting models have 3 tuning parameters:
- , the number of trees to be fit. Note that unlike random forests, boosted tree models can overfit if is too large.
- , the learning rate (or penalty). This scales down the contribution of each tree to the overall prediction. Typical values are 0.01 or 0.001. Smaller values of generally mean we’ll need more trees to get good predictions.
- , the number of splits each tree is allowed. Because boosted trees work sequentially, we can use a small and still get good predictions, since each tree effectively contains the information in previous trees. A model comprised entirely of stumps ( = 1) can be very strong if is large enough.