Overview (REGRESSION command)
REGRESSION calculates multiple regression equations and associated statistics
and plots. REGRESSION also calculates
collinearity diagnostics, predicted values, residuals, measures of
fit and influence, and several statistics based on these measures.
Options
Input and Output Control Subcommands. DESCRIPTIVES requests descriptive
statistics on the variables in the analysis. SELECT estimates the model based on a subset of cases. REGWGT specifies a weight variable for estimating
weighted least-squares models. MISSING specifies the treatment of cases with missing values. MATRIX reads and writes matrix data files.
Equation-Control
Subcommands. These optional subcommands control the calculation
and display of statistics for each equation. STATISTICS controls the statistics displayed for the
equation(s) and the independent variable(s), CRITERIA specifies the criteria used by the variable
selection method, and ORIGIN specifies
whether regression is through the origin.
Analysis of Residuals, Fit, and Influence. REGRESSION creates temporary
variables containing predicted values, residuals, measures of fit
and influence, and several statistics based on these measures. These
temporary variables can be analyzed within REGRESSION in Casewise Diagnostics tables (CASEWISE subcommand), scatterplots (SCATTERPLOT subcommand), histograms and
normal probability plots (RESIDUALS subcommand), and partial regression plots (PARTIALPLOT subcommand). Any of the residuals subcommands
can be specified to obtain descriptive statistics for the predicted
values, residuals, and their standardized versions. Any of the temporary
variables can be added to the active dataset with the SAVE subcommand.
Templates. You can specify a template, using the TEMPLATE subcommand, to override the default chart attribute settings on
your system.
Basic Specification
The
basic specification is DEPENDENT, which initiates the equation(s) and defines at least one dependent
variable, followed by METHOD,
which specifies the method for selecting independent variables.
- By default, all variables named on
DEPENDENTandMETHODare used in the analysis. - The default display for each equation includes a Model Summary table showing R 2, an ANOVA table, a Coefficients table displaying related statistics for variables in the equation, and an Excluded Variables table displaying related statistics for variables not yet in the equation.
- By default, all cases in the active dataset with valid values for all selected variables are used to compute the correlation matrix on which the regression equations are based. The default equations include a constant (intercept).
- All residuals analysis subcommands are optional. Most have defaults that can be requested by including the subcommand without any further specifications. These defaults are described in the section for each subcommand.
Subcommand Order
The
standard subcommand order for REGRESSION is
REGRESSION MATRIX=...
/VARIABLES=...
/DESCRIPTIVES=...
/SELECT=...
/MISSING=...
/REGWGT=...
--Equation Block--
/STATISTICS=...
/CRITERIA=...
/ORIGIN
/DEPENDENT=...
--Method Block(s)--
/METHOD=...
[/METHOD=...]
--Residuals Block--
/RESIDUALS=...
/SAVE=...
/CASEWISE=...
/SCATTERPLOT=...
/PARTIALPLOT=...
/OUTFILE=... - When used,
MATRIXmust be specified first. - Subcommands listed before the equation block must be specified before any subcommands within the block.
- Only one equation block is allowed per
REGRESSIONcommand. - An equation block can contain multiple
METHODsubcommands. These methods are applied, one after the other, to the estimation of the equation for that block. - The
STATISTICS,CRITERIA, andORIGIN/NOORIGINsubcommands must precede theDEPENDENTsubcommand. - The residuals subcommands
RESIDUALS,CASEWISE,SCATTERPLOT, andPARTIALPLOTfollow the lastMETHODsubcommand of any equation for which residuals analysis is requested. Statistics are based on this final equation. - Residuals subcommands can be specified
in any order. All residuals subcommands must follow the
DEPENDENTandMETHODsubcommands.
Syntax Rules
-
VARIABLEScan be specified only once. If omitted,VARIABLESdefaults toCOLLECT. - The
DEPENDENTsubcommand can be specified only once and must be followed immediately by one or moreMETHODsubcommands. -
CRITERIA,STATISTICS, andORIGINmust be specified beforeDEPENDENTandMETHOD. If any of these subcommands are specified more than once, only the last specified is in effect for all subsequent equations. - More
than one variable can be specified on the
DEPENDENTsubcommand. An equation is estimated for each. - If no variables are specified on
METHOD, all variables named onVARIABLESbut not onDEPENDENTare considered for selection.
Operations
- This procedure uses the multithreaded options specified by
SET THREADSandSET MCACHE.
Operations
-
REGRESSIONcalculates a correlation matrix that includes all variables named onVARIABLES. All equations requested on theREGRESSIONcommand are calculated from the same correlation matrix. - The
MISSING,DESCRIPTIVES, andSELECTsubcommands control the calculation of the correlation matrix and associated displays. - If multiple
METHODsubcommands are specified, they operate in sequence on the equations defined by the precedingDEPENDENTsubcommand. - Only independent variables that pass the tolerance criterion are candidates for entry into the equation. See the topic CRITERIA Subcommand (REGRESSION command) for more information.
- The temporary variables PRED (unstandardized predicted value), ZPRED (standardized predicted value), RESID (unstandardized residual), and ZRESID (standardized residual) are calculated and descriptive statistics are displayed whenever any residuals subcommand is specified. If any of the other temporary variables are referred to on the command, they are also calculated.
- Predicted values and statistics based on predicted values are calculated for every observation that has valid values for all variables in the equation. Residuals and statistics based on residuals are calculated for all observations that have a valid predicted value and a valid value for the dependent variable. The missing-values option therefore affects the calculation of residuals and predicted values.
- No residuals or predictors
are generated for cases deleted from the active dataset with
SELECT IF, a temporarySELECT IF, orSAMPLE. - All variables are standardized before plotting. If the unstandardized version of a variable is requested, the standardized version is plotted.
- Residuals processing is
not available when the active dataset is a matrix file or is replaced
by a matrix file with
MATRIX OUT(*)onREGRESSION. IfRESIDUALS,CASEWISE,SCATTERPLOT,PARTIALPLOT, orSAVEare used whenMATRIX IN(*)orMATRIX OUT(*)is specified, theREGRESSIONcommand is not executed.
For each analysis, REGRESSION can calculate the following types
of temporary variables:
PRED. Unstandardized predicted values.
RESID. Unstandardized residuals.
DRESID. Deleted residuals.
ADJPRED. Adjusted predicted values.
ZPRED. Standardized predicted values.
ZRESID. Standardized residuals.
SRESID. Studentized residuals.
SDRESID. Studentized deleted residuals. 1
SEPRED. Standard errors of the predicted values.
MAHAL. Mahalanobis distances.
COOK. Cook’s distances. 2
LEVER. Centered leverage values. 3
DFBETA. Change in
the regression coefficient that results from the deletion of the ith case. A DFBETA value is computed for each case for
each regression coefficient generated by a model. 4
SDBETA. Standardized
DFBETA. An SDBETA value is computed for each case for each regression coefficient
generated by a model. 5
DFFIT. Change in the predicted value when the ith case is deleted. 6
SDFIT. Standardized DFFIT. 7
COVRATIO. Ratio of the determinant of the covariance matrix with the ith case deleted to the determinant of the covariance matrix with all cases included. 8
MCIN. Lower and
upper bounds for the prediction interval of the mean predicted response. A lowerbound LMCIN and an upperbound UMCIN are generated. The default confidence
interval is 95%. The confidence interval can be reset with the CIN subcommand. 9
ICIN. Lower and
upper bounds for the prediction interval for a single observation. A lowerbound LICIN and an upperbound
UICIN are generated. The default
confidence interval is 95%. The confidence interval can be reset with
the CIN subcommand. 10