Random Forest node Build Options

Use the Build Options tab to specify build options for the Random Forest node, including basic options and advanced options. For more information about these options, see https://scikit-learn.org/stable/modules/ensemble.html#forest

Basic

Number of trees to build. Select the number of trees in the forest.

Specify max depth. If not selected, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

Max depth. The maximum depth of the tree.

Minimum leaf node size. The minimum number of samples required to be at a leaf node.

Number of features to use for splitting. The number of features to consider when looking for the best split:

If auto, then max_features=sqrt(n_features) for classifier and max_features=n_features for regression.
If sqrt, then max_features=sqrt(n_features).
If log2, then max_features=log2(n_features).

Advanced

Use bootstrap samples when building trees. If selected, bootstramp samples are used when building trees.

Use out-of-bag samples to estimate the generalization accuracy. If selected, out-of-bag samples are used to estimate the generalization accuracy.

Use extremely randomized trees. If selected, extremely randomized trees are used instead of general random forests. In extremely randomized trees, randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows the variance of the model to be reduced a bit more, at the expense of a slightly greater increase in bias. ¹

Replicate results. If selected, the model building process is replicated to achieve the same scoring results.

Random seed. You can click Generate to generate the seed used by the random number generator.

Hyper-Parameter Optimization (Based on Rbfopt). Select this option to enable Hyper-Parameter Optimization based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lower error rate on the samples. For details about Rbfopt, see http://rbfopt.readthedocs.io/en/latest/rbfopt_settings.html.

Target. The objective function value (error rate of the model on the samples) you want to reach (i.e., the value of the unknown optimum). Set to an acceptable value such as 0.01.

Max iterations. The maximum number of iterations to try the model. Default is 1000.

Max evaluations. The maximum number of function evaluations in accurate mode, for trying the model. Default is 300.

The following table shows the relationship between the settings in the SPSS® Modeler Random Forest node dialog and the Python Random Forest library parameters.

Table 1. Node properties mapped to Python library parameters
SPSS Modeler setting	Script name (property name)	Random Forest parameter
Target	`target`
Predictors	`inputs`
Number of trees to build	`n_estimators`	`n_estimators`
Specify max depth	`specify_max_depth`	`specify_max_depth`
Max depth	`max_depth`	`max_depth`
Minimum leaf node size	`min_samples_leaf`	`min_samples_leaf`
Number of features to use for splitting	`max_features`	`max_features`
Use bootstrap samples when building trees	`bootstrap`	`bootstrap`
Use out-of-bag samples to estimate the generalization accuracy	`oob_score`	`oob_score`
Use extremely randomized trees	`extreme`
Replicate results	`use_random_seed`
Random seed	`random_seed`	`random_seed`
Hyper-Parameter Optimization (based on Rbfopt)	`enable_hpo`
Target (for HPO)	`target_objval`
Max iterations (for HPO)	`max_iterations`
Max evaluations (for HPO)	`max_evaluations`

¹L. Breiman, "Random Forests," Machine Learning, 45(1), 5-32, 2001.