XGBoost Tree node Build Options
Use the Build Options tab to specify build options for the XGBoost Tree node, including basic options for model building and tree growth, learning task options for objectives, and advanced options for control overfitting and handling of imbalanced datasets. For additional information about these options, see the following online resources:
Basic
Hyper-Parameter Optimization (Based on Rbfopt). Select this option to enable Hyper-Parameter Optimization based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lower error rate on the samples. For details about Rbfopt, see http://rbfopt.readthedocs.io/en/latest/rbfopt_settings.html.
Tree method. Select the XGBoost tree construction algorithm to use.
Num boost round. Specify the number of boosting iterations.
Max depth. Specify the maximum depth for trees. Increasing this value will make the model more complex and likely to be overfitting.
Min child weight. Specify the minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than this Min child weight, then the building process will stop further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed in each node. The larger the weight, the more conservative the algorithm will be.
Max delta step. Specify the maximum delta step to allow for each tree's weight estimation. If set to 0, there is no constraint. If set to a positive value, it can help the update step be more conservative. Usually this parameter is not needed, but it may help in logistic regression when a class is extremely imbalanced.
Learning Task
Objective. Select from the following learning task objective types: reg:linear, reg:logistic, reg:gamma, reg:tweedie, count:poisson, rank:pairwise, binary:logistic, or multi.
Early stopping. Select this option if you want to use the early stopping function. For the stopping rounds, validation errors must decrease at least every early stopping round(s) to continue training. The Evaluation data ratio is the ratio of input data used for validation errors.
Random Seed. You can click Generate to generate the seed used by the random number generator.
Advanced
Sub sample. Sub sample is the ratio of the training instance. For example, if you set this to 0.5, XGBoost will randomly collect half the data instances to grow trees and this will prevent overfitting.
Eta. The step size shrinkage used during the update step to prevent overfitting. After each boosting step, the weights of new features can be obtained directly. Eta also shrinks the feature weights to make the boosting process more conservative.
Gamma. The minimum loss reduction required to make a further partition on a leaf node of the tree. The larger the gamma setting, the more conservative the algorithm will be.
Colsample by tree. Sub sample ratio of columns when constructing each tree.
Colsample by level. Sub sample ratio of columns for each split, in each level.
Lambda. L2 regularization term on weights. Increasing this value will make the model more conservative.
Alpha. L1 regularization term on weights Increasing this value will make model more conservative.
Scale pos weight. Control the balance of positive and negative weights. This is useful for unbalanced classes.
SPSS Modeler setting | Script name (property name) | XGBoost parameter |
---|---|---|
Target | TargetField |
|
Predictors | InputFields |
|
Tree method | treeMethod |
tree_method |
Num boost round | numBoostRound |
num_boost_round |
Max depth | maxDepth |
max_depth |
Min child weight | minChildWeight |
min_child_weight |
Max delta step | maxDeltaStep |
max_delta_step |
Objective | objectiveType |
objective |
Early stopping | earlyStopping |
early_stopping_rounds |
stopping rounds | stoppingRounds |
|
Evaluation data ratio | evaluationDataRatio |
|
Random Seed | random_seed |
seed |
Sub sample | sampleSize |
subsample |
Eta | eta |
eta |
Gamma | gamma |
gamma |
Colsample by tree | colsSampleRatio |
colsample_bytree |
Colsample by level | colsSampleLevel |
colsample_bylevel |
Lambda | lambda |
lambda |
Alpha | alpha |
alpha |
Scale pos weight | scalePosWeight |
scale_pos_weight |
1 "XGBoost Parameters" Scalable and Flexible Gradient Boosting. Web. © 2015-2016 DMLC.
2 "Plotting API" Scalable and Flexible Gradient Boosting. Web. © 2015-2016 DMLC.
3 "Scalable and Flexible Gradient Boosting." Web. © 2015-2016 DMLC.