Decision Tree Nodes

The Decision Tree nodes in IBM® SPSS® Modeler provide access to the following tree-building algorithms:

See the topic Decision Tree Models for more information.

The algorithms are similar in that they can all construct a decision tree by recursively splitting the data into smaller and smaller subgroups. However, there are some important differences.

Input fields. The input fields (predictors) can be any of the following types (measurement levels): continuous, categorical, flag, nominal or ordinal.

Target fields. Only one target field can be specified. For C&R Tree, CHAID, Tree-AS, and Random Trees, the target can be continuous, categorical, flag, nominal or ordinal. For QUEST it can be categorical, flag or nominal. For C5.0 the target can be flag, nominal or ordinal.

Type of split. C&R Tree, QUEST, and Random Trees support only binary splits (that is, each node of the tree can be split into no more than two branches). By contrast, CHAID, C5.0, and Tree-AS support splitting into more than two branches at a time.

Method used for splitting. The algorithms differ in the criteria used to decide the splits. When C&R Tree predicts a categorical output, a dispersion measure is used (by default the Gini coefficient, though you can change this). For continuous targets, the least squared deviation method is used. CHAID and Tree-AS use a chi-square test; QUEST uses a chi-square test for categorical predictors, and analysis of variance for continuous inputs. For C5.0 an information theory measure is used, the information gain ratio.

Missing value handling. All algorithms allow missing values for the predictor fields, though they use different methods to handle them. C&R Tree and QUEST use substitute prediction fields, where needed, to advance a record with missing values through the tree during training. CHAID makes the missing values a separate category and enables them to be used in tree building. C5.0 uses a fractioning method, which passes a fractional part of a record down each branch of the tree from a node where the split is based on a field with a missing value.

Pruning. C&R Tree, QUEST and C5.0 offer the option to grow the tree fully and then prune it back by removing bottom-level splits that do not contribute significantly to the accuracy of the tree. However, all of the decision tree algorithms allow you to control the minimum subgroup size, which helps avoid branches with few data records.

Interactive tree building. C&R Tree, QUEST and CHAID provide an option to launch an interactive session. This enables you to build your tree one level at a time, edit the splits, and prune the tree before you create the model. C5.0, Tree-AS, and Random Trees do not have an interactive option.

Prior probabilities. C&R Tree and QUEST support the specification of prior probabilities for categories when predicting a categorical target field. Prior probabilities are estimates of the overall relative frequency for each target category in the population from which the training data are drawn. In other words, they are the probability estimates that you would make for each possible target value prior to knowing anything about predictor values. CHAID, C5.0, Tree-AS, and Random Trees do not support specifying prior probabilities.

Rule sets. Not available for Tree-AS or Random Trees. For models with categorical target fields, the decision tree nodes provide the option to create the model in the form of a rule set, which can sometimes be easier to interpret than a complex decision tree. For C&R Tree, QUEST and CHAID you can generate a rule set from an interactive session; for C5.0 you can specify this option on the modeling node. In addition, all decision tree models enable you to generate a rule set from the model nugget. See the topic Generating a Rule Set from a Decision Tree for more information.