Decision Tree Nodes - Basics

Specify the basic options about how the decision tree is to be built.

Tree growing algorithm (CHAID and Tree-AS only) Choose the type of CHAID algorithm you want to use. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits for each predictor but takes longer to compute.

Maximum tree depth Specify the maximum number of levels below the root node (the number of times the sample will be split recursively). The default is 5; choose Custom and enter a value to specify a different number of levels.

Pruning (C&RT and QUEST only)

Prune tree to avoid overfitting Pruning consists of removing bottom-level splits that do not contribute significantly to the accuracy of the tree. Pruning can help simplify the tree, making it easier to interpret and, in some cases, improving generalization. If you want the full tree without pruning, leave this option deselected.

  • Set maximum difference in risk (in Standard Errors) Enables you to specify a more liberal pruning rule. The standard error rule enables the algorithm to select the simplest tree whose risk estimate is close to (but possibly greater than) that of the subtree with the smallest risk. The value indicates the size of the allowable difference in the risk estimate between the pruned tree and the tree with the smallest risk in terms of the risk estimate. For example, if you specify 2, a tree whose risk estimate is (2 × standard error) larger than that of the full tree could be selected.

Maximum surrogates. Surrogates are a method for dealing with missing values. For each split in the tree, the algorithm identifies the input fields that are most similar to the selected split field. Those fields are the surrogates for that split. When a record must be classified but has a missing value for a split field, its value on a surrogate field can be used to make the split. Increasing this setting will allow more flexibility to handle missing values but may also lead to increased memory usage and longer training times.