The following is a step-by-step breakdown of how the gradient boosting process works.
Initialization: Starts by using a training set to establish a foundation with a base learner model, often a decision tree, whose initial predictions are randomly generated. Typically, the decision tree will only contain a handful of leaf nodes or terminal nodes. Often chosen due to their interpretability, these weak or base learners serve as an optimal starting point. This initial setup paves the way for subsequent iterations to build upon.
Calculating residuals: For each training example, calculate the residual error by subtracting the predicted value from the actual value. This step identifies areas where the model's predictions need improvement.
Refining with regularization: Post residual calculation and preceding the training of a new model, the process of regularization takes place. This stage involves downscaling the influence of each new weak learner integrated into the ensemble. By carefully calibrating this scale, one can govern how swiftly the boosting algorithm advances, thereby aiding in overfitting prevention and overall performance optimization.
Training the next model: Use the residual errors calculated in the previous step as targets and train a new model or weak learner to predict them accurately. This step's focus is on correcting the mistakes made by the previous models, refining the overall prediction.
Ensemble updates: In this stage, the performance of the updated ensemble (including the newly trained model) is typically evaluated by using a separate test set. If the performance on this holdout dataset is satisfactory, the ensemble can be updated by incorporating the new weak learner; otherwise, adjustments might be necessary to the hyperparameters.
Repetition: Repeat the previously presented steps as necessary. Each iteration builds upon and refines the base model through the training of new trees, further improving the model's accuracy. If the ensemble update and final model is satisfactory compared to the baseline model based on accuracy, then move to the next step.
Stopping criteria: Stop the boosting process when a predetermined stopping criterion is met, such as a maximum number of iterations, target accuracy or diminishing returns. This step helps ensure that the model’s final prediction achieves the expected balance between complexity and performance.