XGBoost-AS node Build Options
Use the Build Options tab to specify build options for the XGBoost-AS node, including general options for model building and handling imbalanced datasets, learning task options for objectives and evaluation metrics, and booster parameters for specific boosters. For more information about these options, see the following online resources:
General
Number of Workers. Number of workers used to train the XGBoost model.
Number of Threads. Number of threads used per worker.
Use External Memory. Whether to use external memory as cache.
Booster Type. The booster to use (gbtree, gblinear, or dart).
Booster Rounds Number. The number of rounds for boosting.
Scale pos weight. This setting controls the balance of positive and negative weights, and is useful for unbalanced classes.
Random Seed. Click Generate to generate the seed used by the random number generator.
Learning Task
Objective. Select from the following learning task objective types: reg:linear, reg:logistic, reg:gamma, reg:tweedie, rank:pairwise, binary:logistic, or multi.
Evaluation Metrics. Evaluation metrics for validation data. A default metric will be assigned according to the objective (rmse for regression, error for classification, or mean average precision for ranking). Available options are rmse, mae, logloss, error, merror, mlogloss, uac, ndcg, map, or gamma-deviance (default is rmse).
Booster Parameters
Lambda. L2 regularization term on weights. Increasing this value will make the model more conservative.
Alpha. L1 regularization term on weights Increasing this value will make model more conservative.
Lambda bias. L2 regularization term on bias. (There is no L1 regularization term on bias because it is not important.)
Tree method. Select the XGBoost tree construction algorithm to use.
Max depth. Specify the maximum depth for trees. Increasing this value will make the model more complex and likely to be overfitting.
Min child weight. Specify the minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than this Min child weight, then the building process will stop further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed in each node. The larger the weight, the more conservative the algorithm will be.
Max delta step. Specify the maximum delta step to allow for each tree's weight estimation. If set to 0, there is no constraint. If set to a positive value, it can help the update step be more conservative. Usually this parameter is not needed, but it may help in logistic regression when a class is extremely imbalanced.
Sub sample. Sub sample is the ratio of the training instance. For example, if you set this to 0.5, XGBoost will randomly collect half the data instances to grow trees and this will prevent overfitting.
Eta. The step size shrinkage used during the update step to prevent overfitting. After each boosting step, the weights of new features can be obtained directly. Eta also shrinks the feature weights to make the boosting process more conservative.
Gamma. The minimum loss reduction required to make a further partition on a leaf node of the tree. The larger the gamma setting, the more conservative the algorithm will be.
Colsample by tree. Sub sample ratio of columns when constructing each tree.
Colsample by level. Sub sample ratio of columns for each split, in each level.
Normalization Algorithm. The normalization algorithm to use when the dart booster type is selected under General options. Available options are tree or forest (default is tree).
Sampling Algorithm. The sampling algorithm to use when the dart booster type is selected under General options. The uniform algorithm uniformly selects dropped trees. The weighted algorithm selects dropped trees in proportion to weight. The default is uniform.
Dropout Rate. The dropout rate to use when the dart booster type is selected under General options.
Probability of Skip Dropout. The skip dropout probability to use when the dart booster type is selected under General options. If a dropout is skipped, new trees are added in the same manner as gbtree.
SPSS Modeler setting | Script name (property name) | XGBoost Spark parameter |
---|---|---|
Target | target_fields |
|
Predictors | input_fields |
|
Lambda | lambda |
lambda |
Number of Workers | nWorkers |
nWorkers |
Number of Threads | numThreadPerTask |
numThreadPerTask |
Use External Memory | useExternalMemory |
useExternalMemory |
Booster Type | boosterType |
boosterType |
Boosting Round Number | numBoostRound |
round |
Scale Pos Weight | scalePosWeight |
scalePosWeight |
Objective | objectiveType |
objective |
Evaluation Metrics | evalMetric |
evalMetric |
Lambda | lambda |
lambda |
Alpha | alpha |
alpha |
Lambda bias | lambdaBias |
lambdaBias |
Tree Method | treeMethod |
treeMethod |
Max Depth | maxDepth |
maxDepth |
Min child weight | minChildWeight |
minChildWeight |
Max delta step | maxDeltaStep |
maxDeltaStep |
Sub sample | sampleSize |
sampleSize |
Eta | eta |
eta |
Gamma | gamma |
gamma |
Colsample by tree | colsSampleRation |
colSampleByTree |
Colsample by level | colsSampleLevel |
colsSampleLevel |
Normalization Algorithm | normalizeType |
normalizeType |
Sampling Algorithm | sampleType |
sampleType |
Dropout Rate | rateDrop |
rateDrop |
Probability of Skip Dropout | skipDrop |
skipDrop |
1 "Scalable and Flexible Gradient Boosting." Web. © 2015-2016 DMLC.
2 "XGBoost Parameters" Scalable and Flexible Gradient Boosting. Web. © 2015-2016 DMLC.
3 "ml.dmlc.xgboost4j.scala.spark Params." DMLC for Scalable and Reliable Machine Learning. Web. 3 Oct 2017.