Overview (MULTIPLE IMPUTATION command)
The MULTIPLE IMPUTATION
procedure performs multiple imputation of missing data values. Given
a dataset containing missing values, it outputs one or more datasets
in which missing values are replaced with plausible estimates. The
procedure also summarizes missing values in the working dataset.
Datasets produced by the MULTIPLE IMPUTATION
procedure can be analyzed using supported analysis procedures to
obtain final (combined) parameter estimates that take into account
the inherent uncertainty among the various sets of imputed values.
Options
Variables. You can specify which variables to impute and specify constraints on the imputed values such as minimum and maximum values. You can also specify which variables are used as predictors when imputing missing values of other variables.
Methods. Three imputation methods are offered. The monotone method is an efficient method for data that have a monotone pattern of missingness. Fully conditional specification (FCS) is an iterative Markov Chain Monte Carlo (MCMC) method that is appropriate when the data have an arbitrary (monotone or nonmonotone) missing pattern. The default method (AUTO) scans the data to determine the best imputation method (monotone or FCS). For each method you can control the number of imputations.
Output. By default the procedure displays an overall summary of missingness in your data as well as an imputation summary and the imputation model for each variable whose values are imputed. You can obtain analysis of missing values by variable as well as tabulated patterns of missing values. If you request imputation you can obtain descriptive statistics for imputed values.
Basic Specification
The basic specification is two or more variables and a file specification where the imputed data will be written.
- By default, the procedure imputes missing values
using the
AUTO
method. Five imputations are performed. - When imputing the default model type depends on the measurement level of the variable: for categorical variables, logistic regression is used; for scale variables linear regression is used.
- The procedure generates output that summarizes missingness in the data and summaries how values were imputed.
Operations
- The output dataset contains the original (nonmissing) data and data for one or more imputations. Each imputation includes all of the observed data and imputed data values. The original and imputed data are stacked in the output dataset. A special variable, Imputation_, identifies whether a case represents original data (Imputation_ = 0) or imputed data (Imputation_ =1…m).
- Multiple imputation datasets can be analyzed using supported analysis procedures to obtain final (combined) parameter estimates that take into account the inherent uncertainty in the various sets of imputed values. Variable Imputation_ must be defined as a split variable in order to obtain pooled parameter estimates.
- The procedure honors the random number
generator and seed specified via the global
SET
command. Specify the same seed across invocations of the procedure if you want to be able to reproduce your imputed values. - The procedure honors the
WEIGHT
variable. It is treated as a replication weight when summarizing missing values and estimating imputation models. Cases with negative or zero value of replication weight are ignored. Noninteger weights are rounded to the nearest integer. The procedure also accepts analysis weights (seeANALYSISWEIGHT
subcommand). - The procedure honors
SPLIT FILE
. A separate missing value analysis and set of imputations is produced for each combination of values of the split variables. An error occurs if imputation is requested and the input dataset has eight split variables. - The procedure honors the
FILTER
command. Cases filtered out are ignored by the procedure. - The procedure accepts string variables and treats them as categorical. Completely blank string values are treated as valid values, i.e., they are not replaced.
- The procedure treats both user- and system-missing values as invalid values. Both types of missing values are replaced when values are imputed and both are treated as invalid values of variables used as predictors in imputation models. User- and system-missing values are also treated as missing in analyses of missingness (counts of missing values, etc.).
- Cases that have a missing value for each analysis variable are included in analyses of
missingness but are excluded from imputation. Specifically, values of such cases are not imputed and
are excluded when building imputation models. The determination of which cases are completely
missing is made after any variables are filtered out of the imputation model by the
MAXPCTMISSING
keyword. - An error occurs if imputation is requested and the input dataset contains a variable named Imputation_.
- An error occurs if imputation and iteration history are requested and the input dataset contains a variable named Iteration_, or SummaryStatistic_.
Syntax Rules
- Two or more analysis variables are required.
- The
OUTFILE
subcommand is required unless imputation is turned off (/IMPUTE METHOD=NONE
). All other subcommands are optional. - Only a single instance of each
subcommand is allowed except for the
CONSTRAINTS
command, which can be repeated. - An error occurs if an attribute or keyword is specified more than once within a subcommand.
- Equals signs shown in the syntax chart are required.
- Subcommand names and keywords must be spelled in full.
- Empty subcommands are not allowed.