Multiple Imputation

The purpose of multiple imputation is to generate possible values for missing values, thus creating several "complete" sets of data. Analytic procedures that work with multiple imputation datasets produce output for each "complete" dataset, plus pooled output that estimates what the results would have been if the original dataset had no missing values. These pooled results are generally more accurate than those provided by single imputation methods.

Multiple Imputation Data Considerations

Analysis variables. The analysis variables can be:

An icon next to each variable in the variable list identifies the measurement level and data type:

Table 1. Measurement level icons
  Numeric String Date Time
Scale (Continuous)
Scale icon
n/a
Scale Date icon
Scale Time icon
Ordinal
Ordinal icon
Ordinal String icon
Ordinal Date icon
Ordinal Time icon
Nominal
Nominal icon
Nominal String icon
Nominal Date icon
Nominal Time icon

Frequency weights. Frequency (replication) weights are honored by this procedure. Cases with negative or zero replication weight value are ignored. Noninteger weights are rounded to the nearest integer.

Analysis Weight. Analysis (regression or sampling) weights are incorporated in summaries of missing values and in fitting imputation models. Cases with a negative or zero analysis weight are excluded.

Complex Samples. The Multiple Imputation procedure does not explicitly handle strata, clusters, or other complex sampling structures, though it can accept final sampling weights in the form of the analysis weight variable. Also note that Complex Sampling procedures currently do not automatically analyze multiply imputed datasets. For a full list of procedures that support pooling, see Analyzing Multiple Imputation Data.

Missing Values. Both user- and system-missing values are treated as invalid values; that is, both types of missing values are replaced when values are imputed and both are treated as invalid values of variables used as predictors in imputation models. User- and system-missing values are also treated as missing in analyses of missing values.

Replicating results (Impute Missing Data Values). If you want to replicate your imputation results exactly, use the same initialization value for the random number generator, the same data order, and the same variable order, in addition to using the same procedure settings.

There are two dialogs dedicated to multiple imputation.

These dialogs paste MULTIPLE IMPUTATION command syntax.