Multiple Imputation

The purpose of multiple imputation is to generate possible values for missing values, thus creating several "complete" sets of data. Analytic procedures that work with multiple imputation datasets produce output for each "complete" dataset, plus pooled output that estimates what the results would have been if the original dataset had no missing values. These pooled results are generally more accurate than those provided by single imputation methods.

Multiple Imputation Data Considerations

Analysis variables. The analysis variables can be:

  • Nominal. A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the department of the company in which an employee works). Examples of nominal variables include region, postal code, and religious affiliation.
  • Ordinal. A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied to highly satisfied). Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores.
  • Scale. A variable can be treated as scale (continuous) when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars.

    The procedure assumes that the appropriate measurement level has been assigned to all variables; however, you can temporarily change the measurement level for a variable by right-clicking the variable in the source variable list and selecting a measurement level from the pop-up menu. To permanently change the level of measurement for a variable, see Variable measurement level.

An icon next to each variable in the variable list identifies the measurement level and data type:

Table 1. Measurement level icons
  Numeric String Date Time
Scale (Continuous)
Scale icon
n/a
Scale Date icon
Scale Time icon
Ordinal
Ordinal icon
Ordinal String icon
Ordinal Date icon
Ordinal Time icon
Nominal
Nominal icon
Nominal String icon
Nominal Date icon
Nominal Time icon

Frequency weights. Frequency (replication) weights are honored by this procedure. Cases with negative or zero replication weight value are ignored. Noninteger weights are rounded to the nearest integer.

Analysis Weight. Analysis (regression or sampling) weights are incorporated in summaries of missing values and in fitting imputation models. Cases with a negative or zero analysis weight are excluded.

Complex Samples. The Multiple Imputation procedure does not explicitly handle strata, clusters, or other complex sampling structures, though it can accept final sampling weights in the form of the analysis weight variable. Also note that Complex Sampling procedures currently do not automatically analyze multiply imputed datasets. For a full list of procedures that support pooling, see Analyzing Multiple Imputation Data.

Missing Values. Both user- and system-missing values are treated as invalid values; that is, both types of missing values are replaced when values are imputed and both are treated as invalid values of variables used as predictors in imputation models. User- and system-missing values are also treated as missing in analyses of missing values.

Replicating results (Impute Missing Data Values). If you want to replicate your imputation results exactly, use the same initialization value for the random number generator, the same data order, and the same variable order, in addition to using the same procedure settings.

  • Random number generation. The procedure uses random number generation during calculation of imputed values. To reproduce the same randomized results in the future, use the same initialization value for the random number generator before each run of the Impute Missing Data Values procedure. See the topic Random Number Generators for more information.
  • Case order. Values are imputed in case order.
  • Variable order. The fully conditional specification (FCS) imputation method imputes values in the order specified in the Analysis Variables list.

There are two dialogs dedicated to multiple imputation.

  • Analyze Patterns provides descriptive measures of the patterns of missing values in the data, and can be useful as an exploratory step before imputation.
  • Impute Missing Data Values is used to generate multiple imputations. The complete datasets can be analyzed with procedures that support multiple imputation datasets. See Random Number Generators for information on analyzing multiple imputation datasets and a list of procedures that support these data.

These dialogs paste MULTIPLE IMPUTATION command syntax.