REGRESSION Subcommand (MVA command)

The REGRESSION subcommand estimates missing values by using multiple linear regression. It can add a random component to the regression estimate. Output includes estimates of means, a covariance matrix, and a correlation matrix of the variables that are specified as predicted.

  • By default, all of the variables that are specified as predictors (after WITH) are used in the estimation, but you can limit the number of predictors (independent variables) by using NPREDICTORS.
  • Predicted and predictor variables, if specified, must be quantitative.
  • By default, REGRESSION adds the observed residuals of a randomly selected complete case to the regression estimates. However, you can specify that the program add random normal, t, or no variates instead. The normal and t distributions are properly scaled, and the degrees of freedom can be specified for the t distribution.
  • If the number of complete cases is less than half the total number of cases, the default ADDTYPE is NORMAL instead of RESIDUAL.
  • You can save a data file with the missing values filled in. You must specify a filename and its complete path in single or double quotation marks.
  • The criteria and OUTFILE specifications for the REGRESSION subcommand must be enclosed in a single pair of parentheses.

The criteria for the REGRESSION subcommand are as follows:

TOLERANCE=value. Numerical accuracy control. Helps eliminate predictor variables that are highly correlated with other predictor variables and would reduce the accuracy of the matrix inversions that are involved in the calculations. If a variable passes the tolerance criterion, it is eligible for inclusion. The smaller the tolerance, the more inaccuracy is tolerated. The default value is 0.001.

FLIMIT=n. F-to-enter limit. The minimum value of the F statistic that a variable must achieve in order to enter the regression estimation. You may want to change this limit, depending on the number of variables and the correlation structure of the data. The default value is 4.

NPREDICTORS=n. Maximum number of predictor variables. Limits the total number of predictors in the analysis. The analysis uses the stepwise selected n best predictors, entered in accordance with the tolerance. If n=0, it is equivalent to replacing each variable with its mean.

ADDTYPE . Type of distribution from which the error term is randomly drawn. Random errors can be added to the regression estimates before the means, correlations, and covariances are calculated. You can specify one of the following types:

  • RESIDUAL. Error terms are chosen randomly from the observed residuals of complete cases to be added to the regression estimates.
  • NORMAL. Error terms are randomly drawn from a distribution with the expected value 0 and the standard deviation equal to the square root of the mean squared error term (sometimes called the root mean squared error, or RMSE) of the regression.
  • T(n). Error terms are randomly drawn from the t(n) distribution and scaled by the RMSE. The degrees of freedom can be specified in parentheses. If T is specified without a value, the default degrees of freedom is 5.
  • NONE. Estimates are made from the regression model with no error term added.

The following keyword produces a new data file:

OUTFILE. Specify a filename or previously declared dataset name. Filenames should be enclosed in quotation marks and are stored in the working directory unless a path is included as part of the file specification. Datasets are available during the current session but are not available in subsequent sessions unless you explicitly save them as data files. Missing values for the dependent variables in the file are imputed (filled in) by using the regression algorithm.

Examples

MVA VARIABLES=males to tuition
 /REGRESSION (OUTFILE='/colleges/regdata.sav').
  • All variables in the variables list are included in the estimations.
  • The output includes the means of the listed variables, a correlation matrix, and a covariance matrix.
  • A new data file named regdata.sav with imputed values is saved in the /colleges directory.
    MVA VARIABLES=males to tuition
     /REGRESSION males verbal math WITH males verbal math faculty
      (ADDTYPE = T(7)).
  • The output includes the means of the listed variables, a correlation matrix, and a covariance matrix.
  • A t distribution with 7 degrees of freedom is used to produce the randomly assigned additions to the estimates.