Missing Values in Tree Models
The different growing methods deal with missing values for independent (predictor) variables in different ways:
- CHAID and Exhaustive CHAID treat all system- and user-missing values for each independent variable as a single category. For scale and ordinal independent variables, that category may or may not subsequently get merged with other categories of that independent variable, depending on the growing criteria.
- CRT and QUEST attempt to use surrogates for independent (predictor) variables. For cases in which the value for that variable is missing, other independent variables having high associations with the original variable are used for classification. These alternative predictors are called surrogates.
To know more, go to Professional Edition> Decision Trees>Creating Decision Trees
This example shows the difference between CHAID and CRT when there are missing values for independent variables used in the model.
For this example, we'll use the data file tree_missing_data.sav. See the topic Sample Files for more information.
Note: For nominal independent variables and nominal dependent variables, you can choose to treat user-missing values as valid values, in which case those values are treated like any other nonmissing values. See the topic Missing Values for more information.