Random Forest node Build Options

Use the Build Options tab to specify build options for the Random Forest node, including basic options and advanced options. For more information about these options, see https://scikit-learn.org/stable/modules/ensemble.html#forest

Basic

Number of trees to build. Select the number of trees in the forest.

Specify max depth. If not selected, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

Max depth. The maximum depth of the tree.

Minimum leaf node size. The minimum number of samples required to be at a leaf node.

Number of features to use for splitting. The number of features to consider when looking for the best split:
  • If auto, then max_features=sqrt(n_features) for classifier and max_features=n_features for regression.
  • If sqrt, then max_features=sqrt(n_features).
  • If log2, then max_features=log2(n_features).

Advanced

Use bootstrap samples when building trees. If selected, bootstramp samples are used when building trees.

Use out-of-bag samples to estimate the generalization accuracy. If selected, out-of-bag samples are used to estimate the generalization accuracy.

Use extremely randomized trees. If selected, extremely randomized trees are used instead of general random forests. In extremely randomized trees, randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows the variance of the model to be reduced a bit more, at the expense of a slightly greater increase in bias. 1

Replicate results. If selected, the model building process is replicated to achieve the same scoring results.

Random seed. You can click Generate to generate the seed used by the random number generator.

Hyper-Parameter Optimization (Based on Rbfopt). Select this option to enable Hyper-Parameter Optimization based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lower error rate on the samples. For details about Rbfopt, see http://rbfopt.readthedocs.io/en/latest/rbfopt_settings.html.

Target. The objective function value (error rate of the model on the samples) you want to reach (i.e., the value of the unknown optimum). Set to an acceptable value such as 0.01.

Max iterations. The maximum number of iterations to try the model. Default is 1000.

Max evaluations. The maximum number of function evaluations in accurate mode, for trying the model. Default is 300.

The following table shows the relationship between the settings in the SPSS® Modeler Random Forest node dialog and the Python Random Forest library parameters.
Table 1. Node properties mapped to Python library parameters
SPSS Modeler setting Script name (property name) Random Forest parameter
Target target
Predictors inputs
Number of trees to build n_estimators n_estimators
Specify max depth specify_max_depth specify_max_depth
Max depth max_depth max_depth
Minimum leaf node size min_samples_leaf min_samples_leaf
Number of features to use for splitting max_features max_features
Use bootstrap samples when building trees bootstrap bootstrap
Use out-of-bag samples to estimate the generalization accuracy oob_score oob_score
Use extremely randomized trees extreme
Replicate results use_random_seed
Random seed random_seed random_seed
Hyper-Parameter Optimization (based on Rbfopt) enable_hpo
Target (for HPO) target_objval
Max iterations (for HPO) max_iterations
Max evaluations (for HPO) max_evaluations

1L. Breiman, "Random Forests," Machine Learning, 45(1), 5-32, 2001.