MISSING keyword (MODEL HANDLE command)
The MISSING keyword controls the treatment of missing values, encountered during the scoring process, for the predictor variables defined in the model. A missing value in the context of scoring refers to one of the following:
- A predictor contains no value. For numeric fields (variables), this means the system-missing value. For string fields, this means a null string.
- The value has been defined as user-missing, in the model, for the given predictor. Values defined as user-missing in the active dataset, but not in the model, are not treated as missing values in the scoring process.
- The predictor is categorical and the value is not one of the categories defined in the model.
SYSMIS. Return the system-missing value when scoring a case with a missing value.
SUBSTITUTE. Use value substitution when scoring cases with missing values. This is the default.
The method for determining a value to substitute for a missing value depends on the type of predictive model:
- IBM® SPSS® Statistics models. For independent variables in linear regression (REGRESSION command) and discriminant (DISCRIMINANT command) models, if mean value substitution for missing values was specified when building and saving the model, then this mean value is used in place of the missing value in the scoring computation, and scoring proceeds. If the mean value is not available, then APPLYMODEL and STRAPPLYMODEL return the system-missing value.
- IBM SPSS AnswerTree models & TREE command models. For the CHAID and Exhaustive CHAID algorithms, the biggest child node is selected for a missing split variable. The biggest child node is determined by the algorithm to be the one with the largest population among the child nodes using learning sample cases. For C&RT and QUEST algorithms, surrogate split variables (if any) are used first. (Surrogate splits are splits that attempt to match the original split as closely as possible using alternate predictors.) If no surrogate splits are specified or all surrogate split variables are missing, the biggest child node is used.
- IBM SPSS Modeler models. Linear regression models are handled as described under IBM SPSS Statistics models. Logistic regression models are handled as described under Logistic Regression models. C&R Tree models are handled as described for C&RT models under AnswerTree models.
- Logistic Regression models. For covariates in logistic regression models, if a mean value of the predictor was included as part of the saved model, then this mean value is used in place of the missing value in the scoring computation, and scoring proceeds. If the predictor is categorical (for example, a factor in a logistic regression model), or if the mean value is not available, then APPLYMODEL and STRAPPLYMODEL return the system-missing value.
Example
MODEL HANDLE NAME=twostep1 FILE=’twostep1.mml’
/OPTIONS MISSING=SYSMIS.
- In this example, missing values encountered during scoring give rise to system-missing results.