Overview (MVA command)
MVA
(Missing
Value Analysis) describes the missing value patterns in a data file
(data matrix). It can estimate the means, the covariance matrix, and
the correlation matrix by using listwise, pairwise, regression, and
EM estimation methods. Missing values themselves can be estimated
(imputed), and you can then save the new data file.
Options
Categorical variables. String variables are automatically
defined as categorical. For a long string variable, only the first
eight characters are used to define categories. Quantitative variables
can be designated as categorical by using the CATEGORICAL
subcommand.
MAXCAT
specifies the maximum number of categories
for any categorical variable. If any categorical variable has more
than the specified number of distinct values, MVA
is not executed.
Analyzing Patterns. For each quantitative
variable, the TTEST
subcommand
produces a series of t tests.
Values of the quantitative variable are divided into two groups, based
on the presence or absence of other variables. These pairs of groups
are compared using the t test.
Crosstabulating
Categorical Variables. The CROSSTAB
subcommand produces a table for each categorical variable, showing,
for each category, how many nonmissing values are in the other variables
and the percentages of each type of missing value.
Displaying Patterns. DPATTERN
displays a case-by-case
data pattern with codes for system-missing, user-missing, and extreme
values. MPATTERN
displays only
the cases that have missing values and sorts by the pattern that is
formed by missing values. TPATTERN
tabulates the cases that have a common pattern of missing values.
The pattern tables have sorting options. Also, descriptive variables
can be specified.
Labeling Cases. For pattern tables, an ID
variable can be specified to label cases.
Suppression of
Rows. To shorten tables, the PERCENT
keyword suppresses missing-value patterns that
occur relatively infrequently.
Statistics. Displays of univariate, listwise, and pairwise statistics are available.
Estimation. EM
and REGRESSION
use different algorithms to supply estimates of missing values,
which are used in calculating estimates of the mean vector, the covariance
matrix, and the correlation matrix of dependent variables. The estimates
can be saved as replacements for missing values in a new data file.
Basic Specification
The basic specification
depends on whether you want to describe the missing data pattern or
estimate statistics. Often, description is done first, and then, considering
the results, an estimation is done. Alternatively, both description
and estimation can be done by using the same MVA
command.
Descriptive Analysis. A basic descriptive
specification includes a list of variables and a statistics or pattern
subcommand. For example, a list of variables and the subcommand DPATTERN
would show missing value patterns
for all cases with respect to the list of variables.
Estimation. A basic estimation specification includes a variable list and an estimation method. For example, if the EM method is specified, the following are estimated: the mean vector, the covariance matrix, and the correlation matrix of quantitative variables with missing values.