Impute Missing Data Values (Multiple Imputation)

Impute Missing Data Values is used to generate multiple imputations. The complete datasets can be analyzed with procedures that support multiple imputation datasets. See Analyzing Multiple Imputation Data for information on analyzing multiple imputation datasets and a list of procedures that support these data. This is a Multiple Imputation procedure.

Example. A telecommunications provider wants to better understand service usage patterns in its customer database. They have complete data for services used by their customers, but the demographic information collected by the company has a number of missing values. Moreover, these values are not missing completely at random, so multiple imputation will be used to complete the dataset.

From the menus choose:

Analyze > Multiple Imputation > Impute Missing Data Values...

  1. Select at least two variables in the imputation model. The procedure imputes multiple values for missing data for these variables.
  2. Specify the number of imputations to compute. By default, this value is 5.
  3. Specify a dataset or IBM® SPSS® Statistics-format data file to which imputed data should be written.

    The output dataset consists of the original case data with missing data plus a set of cases with imputed values for each imputation. For example, if the original dataset has 100 cases and you have five imputations, the output dataset will have 600 cases. All variables in the input dataset are included in the output dataset. Dictionary properties (names, labels, etc.) of existing variables are copied to the new dataset. The file also contains a new variable, Imputation_, a numeric variable that indicates the imputation (0 for original data, or 1..n for cases having imputed values).

    The procedure automatically defines the Imputation_ variable as a split variable (see Split file) when the output dataset is created. If splits are in effect when the procedure executes, the output dataset includes one set of imputations for each combination of values of split variables.

Optional Settings

Analysis Weight. This variable contains analysis (regression or sampling) weights. The procedure incorporates analysis weights in regression and classification models used to impute missing values. Analysis weights are also used in summaries of imputed values; for example, mean, standard deviation, and standard error. Cases with a negative or zero analysis weight are excluded.

Fields with unknown measurement level

The Measurement Level alert is displayed when the measurement level for one or more variables (fields) in the dataset is unknown. Since measurement level affects the computation of results for this procedure, all variables must have a defined measurement level.

Scan Data. Reads the data in the active dataset and assigns default measurement level to any fields with a currently unknown measurement level. If the dataset is large, that may take some time.

Assign Manually. Opens a dialog that lists all fields with an unknown measurement level. You can use this dialog to assign measurement level to those fields. You can also assign measurement level in Variable View of the Data Editor.

Since measurement level is important for this procedure, you cannot access the dialog to run this procedure until all fields have a defined measurement level.

This procedure pastes MULTIPLE IMPUTATION command syntax.