Random Trees node - basics

Specify basic options for how to build the decision tree.

Number of models to build. Specify the maximum number of trees that the node can build.

Sample size. By default, the size of the bootstrap sample is equal to the original training data. When dealing with large datasets, reducing the sample size can increase performance. It is a ratio from 0 to 1. For example, set the sample size to 0.6 to reduce it to 60% of the original training data size.

Handle imbalanced data. If the model's target is a flag outcome (for example, purchase or do not purchase) and the ratio of the desired outcome to non-desired is very small, the data is imbalanced and the bootstrap sampling that is conducted by the model could affect model accuracy. To improve accuracy select this check box; the model then captures a larger proportion of the desired outcome and generates a better model.

Use weighted sampling for variable selection. By default, variables for each leaf node are randomly selected with the same probability. To apply weighting to variables and improve the selection process, select this check box. The weight is calculated by the Random Trees node itself. The more important fields (with higher weight) are more likely to be selected as predictors.

Maximum number of nodes. Specify the maximum number of leaf nodes that are allowed in individual trees. If the number would be exceeded on the next split, tree growth is stopped before the split occurs.

Maximum tree depth. Specify the maximum number of levels leaf nodes below the root node; that is, the number of times the sample is split recursively).

Minimum child node size. Specify the minimum number of records that must be contained in a child node after the parent node is split. If a child node would contain fewer records than you enter, the parent node will not be split.

Specify number of predictors to use for splitting. If you are building split models, set the minimum number of predictors to be used to build each split. This prevents the split from creating excessively small subgroups. If you don't select this option, the default value is ⌊sqrt(M)⌋ for classification and ⌊M/3⌋ for regression, where M is the total number of predictor variables. If this option is selected, the specified number of predictors will be used.

Note: The number of predictors for splitting cannot be greater than the total number of predictors in the data.

Stop building when accuracy can no longer be improved. Random Trees uses a particular procedure for deciding when to stop training. Specifically, if the improvement of the current ensemble accuracy is smaller than a specified threshold, then it will stop adding new trees. This could result in a model with fewer trees than the value you specified for the Number of models to build option.