METHOD Subcommand (TREE command)

The METHOD subcommand specifies the growing method and optional parameters. Each keyword is followed by an equals sign (=) and the value for that keyword.

Example

TREE risk [o] BY income age creditscore
 /METHOD TYPE=CRT MAXSURROGATES=2 PRUNE=SE(0).

TYPE Keyword

TYPE specifies the growing method. For CRT and QUEST, splits are always binary. CHAID and Exhaustive CHAID allow multiway splits.

CHAID. Chi-squared Automatic Interaction Detection. At each step, CHAID chooses the independent (predictor) variable that has the strongest interaction with the dependent variable. Categories of each predictor are merged if they are not significantly different with respect to the dependent variable. This is the default method.

EXHAUSTIVECHAID. Exhaustive CHAID. A modification of CHAID that examines all possible ways of merging predictor categories.

CRT. Classification and Regression Trees. CRT splits the data into segments that are as homogeneous as possible with respect to the dependent variable.

QUEST. Quick, Unbiased, Efficient Statistical Tree. A method that is fast and avoids other methods' bias in favor of predictors with many categories. QUEST can be specified only if the dependent variable is nominal. An error occurs if the dependent variable is ordinal or scale.

MAXSURROGATES Keyword

CRT and QUEST can use surrogates for independent (predictor) variables. For cases in which the value for that predictor is missing, other predictors having high associations with the original predictor are used for classification. These alternative predictors are called surrogates. The MAXSURROGATES keyword specifies the maximum number of surrogate predictors to compute.

If the growing method is CHAID or EXHAUSTIVECHAID, this keyword is ignored and a warning is issued.

AUTO. The maximum is the number of independent variables minus one. This is the default.

value. User-specified value. The value must be a non-negative integer that is less than the number of independent variables in the model. If you don’t want to use surrogates in the model, specify MAXSURROGATES=0. If the value equals or exceeds the number of independent variables, the setting is ignored and a warning is issued.

PRUNE Keyword

For CRT and QUEST, the tree can be automatically pruned. Pruning can help avoid creating a tree that overfits the data. If you request pruning, the tree is grown until stopping criteria are met. Then it is trimmed automatically according to the specified criterion.

If the growing method is CHAID or EXHAUSTIVECHAID, PRUNE is ignored and a warning is issued.

NONE. The tree is not pruned. This is the default.

SE(value). Prune tree using standard error criterion. The procedure prunes down to the smallest subtree with a risk value within a specified number of standard errors of that of the subtree with the minimum risk. You can specify the number of standard errors in parentheses. The default is 1. The value must be nonnegative. To obtain the subtree with the minimum risk, specify 0.