XGBoost Tree node Build Options

Use the Build Options tab to specify build options for the XGBoost Tree node, including basic options for model building and tree growth, learning task options for objectives, and advanced options for control overfitting and handling of imbalanced datasets. For additional information about these options, see the following online resources:

Basic

Hyper-Parameter Optimization (Based on Rbfopt). Select this option to enable Hyper-Parameter Optimization based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lower error rate on the samples. For details about Rbfopt, see http://rbfopt.readthedocs.io/en/latest/rbfopt_settings.html.

Tree method. Select the XGBoost tree construction algorithm to use.

Num boost round. Specify the number of boosting iterations.

Max depth. Specify the maximum depth for trees. Increasing this value will make the model more complex and likely to be overfitting.

Min child weight. Specify the minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than this Min child weight, then the building process will stop further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed in each node. The larger the weight, the more conservative the algorithm will be.

Max delta step. Specify the maximum delta step to allow for each tree's weight estimation. If set to 0, there is no constraint. If set to a positive value, it can help the update step be more conservative. Usually this parameter is not needed, but it may help in logistic regression when a class is extremely imbalanced.

Learning Task

Objective. Select from the following learning task objective types: reg:linear, reg:logistic, reg:gamma, reg:tweedie, count:poisson, rank:pairwise, binary:logistic, or multi.

Early stopping. Select this option if you want to use the early stopping function. For the stopping rounds, validation errors must decrease at least every early stopping round(s) to continue training. The Evaluation data ratio is the ratio of input data used for validation errors.

Random Seed. You can click Generate to generate the seed used by the random number generator.

Advanced

Sub sample. Sub sample is the ratio of the training instance. For example, if you set this to 0.5, XGBoost will randomly collect half the data instances to grow trees and this will prevent overfitting.

Eta. The step size shrinkage used during the update step to prevent overfitting. After each boosting step, the weights of new features can be obtained directly. Eta also shrinks the feature weights to make the boosting process more conservative.

Gamma. The minimum loss reduction required to make a further partition on a leaf node of the tree. The larger the gamma setting, the more conservative the algorithm will be.

Colsample by tree. Sub sample ratio of columns when constructing each tree.

Colsample by level. Sub sample ratio of columns for each split, in each level.

Lambda. L2 regularization term on weights. Increasing this value will make the model more conservative.

Alpha. L1 regularization term on weights Increasing this value will make model more conservative.

Scale pos weight. Control the balance of positive and negative weights. This is useful for unbalanced classes.

The following table shows the relationship between the settings in the SPSS® Modeler XGBoost Tree node dialog and the Python XGBoost library parameters.
Table 1. Node properties mapped to Python library parameters
SPSS Modeler setting Script name (property name) XGBoost parameter
Target TargetField
Predictors InputFields
Tree method treeMethod tree_method
Num boost round numBoostRound num_boost_round
Max depth maxDepth max_depth
Min child weight minChildWeight min_child_weight
Max delta step maxDeltaStep max_delta_step
Objective objectiveType objective
Early stopping earlyStopping early_stopping_rounds
stopping rounds stoppingRounds  
Evaluation data ratio evaluationDataRatio  
Random Seed random_seed seed
Sub sample sampleSize subsample
Eta eta eta
Gamma gamma gamma
Colsample by tree colsSampleRatio colsample_bytree
Colsample by level colsSampleLevel colsample_bylevel
Lambda lambda lambda
Alpha alpha alpha
Scale pos weight scalePosWeight scale_pos_weight

1 "XGBoost Parameters" Scalable and Flexible Gradient Boosting. Web. © 2015-2016 DMLC.

2 "Plotting API" Scalable and Flexible Gradient Boosting. Web. © 2015-2016 DMLC.

3 "Scalable and Flexible Gradient Boosting." Web. © 2015-2016 DMLC.