Overview (REGRESSION command)
REGRESSION
calculates multiple regression equations and associated statistics
and plots. REGRESSION
also calculates
collinearity diagnostics, predicted values, residuals, measures of
fit and influence, and several statistics based on these measures.
Options
Input and Output Control Subcommands. DESCRIPTIVES
requests descriptive
statistics on the variables in the analysis. SELECT
estimates the model based on a subset of cases. REGWGT
specifies a weight variable for estimating
weighted least-squares models. MISSING
specifies the treatment of cases with missing values. MATRIX
reads and writes matrix data files.
Equation-Control
Subcommands. These optional subcommands control the calculation
and display of statistics for each equation. STATISTICS
controls the statistics displayed for the
equation(s) and the independent variable(s), CRITERIA
specifies the criteria used by the variable
selection method, and ORIGIN
specifies
whether regression is through the origin.
Analysis of Residuals, Fit, and Influence. REGRESSION
creates temporary
variables containing predicted values, residuals, measures of fit
and influence, and several statistics based on these measures. These
temporary variables can be analyzed within REGRESSION
in Casewise Diagnostics tables (CASEWISE
subcommand), scatterplots (SCATTERPLOT
subcommand), histograms and
normal probability plots (RESIDUALS
subcommand), and partial regression plots (PARTIALPLOT
subcommand). Any of the residuals subcommands
can be specified to obtain descriptive statistics for the predicted
values, residuals, and their standardized versions. Any of the temporary
variables can be added to the active dataset with the SAVE
subcommand.
Templates. You can specify a template, using the TEMPLATE
subcommand, to override the default chart attribute settings on
your system.
Basic Specification
The
basic specification is DEPENDENT
, which initiates the equation(s) and defines at least one dependent
variable, followed by METHOD
,
which specifies the method for selecting independent variables.
- By default, all variables named on
DEPENDENT
andMETHOD
are used in the analysis. - The default display for each equation includes a Model Summary table showing R 2, an ANOVA table, a Coefficients table displaying related statistics for variables in the equation, and an Excluded Variables table displaying related statistics for variables not yet in the equation.
- By default, all cases in the active dataset with valid values for all selected variables are used to compute the correlation matrix on which the regression equations are based. The default equations include a constant (intercept).
- All residuals analysis subcommands are optional. Most have defaults that can be requested by including the subcommand without any further specifications. These defaults are described in the section for each subcommand.
Subcommand Order
The
standard subcommand order for REGRESSION
is
REGRESSION MATRIX=...
/VARIABLES=...
/DESCRIPTIVES=...
/SELECT=...
/MISSING=...
/REGWGT=...
--Equation Block--
/STATISTICS=...
/CRITERIA=...
/ORIGIN
/DEPENDENT=...
--Method Block(s)--
/METHOD=...
[/METHOD=...]
--Residuals Block--
/RESIDUALS=...
/SAVE=...
/CASEWISE=...
/SCATTERPLOT=...
/PARTIALPLOT=...
/OUTFILE=...
- When used,
MATRIX
must be specified first. - Subcommands listed before the equation block must be specified before any subcommands within the block.
- Only one equation block is allowed per
REGRESSION
command. - An equation block can contain multiple
METHOD
subcommands. These methods are applied, one after the other, to the estimation of the equation for that block. - The
STATISTICS
,CRITERIA
, andORIGIN
/NOORIGIN
subcommands must precede theDEPENDENT
subcommand. - The residuals subcommands
RESIDUALS
,CASEWISE
,SCATTERPLOT
, andPARTIALPLOT
follow the lastMETHOD
subcommand of any equation for which residuals analysis is requested. Statistics are based on this final equation. - Residuals subcommands can be specified
in any order. All residuals subcommands must follow the
DEPENDENT
andMETHOD
subcommands.
Syntax Rules
-
VARIABLES
can be specified only once. If omitted,VARIABLES
defaults toCOLLECT
. - The
DEPENDENT
subcommand can be specified only once and must be followed immediately by one or moreMETHOD
subcommands. -
CRITERIA
,STATISTICS
, andORIGIN
must be specified beforeDEPENDENT
andMETHOD
. If any of these subcommands are specified more than once, only the last specified is in effect for all subsequent equations. - More
than one variable can be specified on the
DEPENDENT
subcommand. An equation is estimated for each. - If no variables are specified on
METHOD
, all variables named onVARIABLES
but not onDEPENDENT
are considered for selection.
Operations
- This procedure uses the multithreaded options specified by
SET THREADS
andSET MCACHE
.
Operations
-
REGRESSION
calculates a correlation matrix that includes all variables named onVARIABLES
. All equations requested on theREGRESSION
command are calculated from the same correlation matrix. - The
MISSING
,DESCRIPTIVES
, andSELECT
subcommands control the calculation of the correlation matrix and associated displays. - If multiple
METHOD
subcommands are specified, they operate in sequence on the equations defined by the precedingDEPENDENT
subcommand. - Only independent variables that pass the tolerance criterion are candidates for entry into the equation. See the topic CRITERIA Subcommand (REGRESSION command) for more information.
- The temporary variables PRED (unstandardized predicted value), ZPRED (standardized predicted value), RESID (unstandardized residual), and ZRESID (standardized residual) are calculated and descriptive statistics are displayed whenever any residuals subcommand is specified. If any of the other temporary variables are referred to on the command, they are also calculated.
- Predicted values and statistics based on predicted values are calculated for every observation that has valid values for all variables in the equation. Residuals and statistics based on residuals are calculated for all observations that have a valid predicted value and a valid value for the dependent variable. The missing-values option therefore affects the calculation of residuals and predicted values.
- No residuals or predictors
are generated for cases deleted from the active dataset with
SELECT IF
, a temporarySELECT IF
, orSAMPLE
. - All variables are standardized before plotting. If the unstandardized version of a variable is requested, the standardized version is plotted.
- Residuals processing is
not available when the active dataset is a matrix file or is replaced
by a matrix file with
MATRIX OUT(*)
onREGRESSION
. IfRESIDUALS
,CASEWISE
,SCATTERPLOT
,PARTIALPLOT
, orSAVE
are used whenMATRIX IN(*)
orMATRIX OUT(*)
is specified, theREGRESSION
command is not executed.
For each analysis, REGRESSION
can calculate the following types
of temporary variables:
PRED. Unstandardized predicted values.
RESID. Unstandardized residuals.
DRESID. Deleted residuals.
ADJPRED. Adjusted predicted values.
ZPRED. Standardized predicted values.
ZRESID. Standardized residuals.
SRESID. Studentized residuals.
SDRESID. Studentized deleted residuals. 1
SEPRED. Standard errors of the predicted values.
MAHAL. Mahalanobis distances.
COOK. Cook’s distances. 2
LEVER. Centered leverage values. 3
DFBETA. Change in
the regression coefficient that results from the deletion of the ith case. A DFBETA
value is computed for each case for
each regression coefficient generated by a model. 4
SDBETA. Standardized
DFBETA. An SDBETA
value is computed for each case for each regression coefficient
generated by a model. 5
DFFIT. Change in the predicted value when the ith case is deleted. 6
SDFIT. Standardized DFFIT. 7
COVRATIO. Ratio of the determinant of the covariance matrix with the ith case deleted to the determinant of the covariance matrix with all cases included. 8
MCIN. Lower and
upper bounds for the prediction interval of the mean predicted response. A lowerbound LMCIN and an upperbound UMCIN are generated. The default confidence
interval is 95%. The confidence interval can be reset with the CIN
subcommand. 9
ICIN. Lower and
upper bounds for the prediction interval for a single observation. A lowerbound LICIN and an upperbound
UICIN are generated. The default
confidence interval is 95%. The confidence interval can be reset with
the CIN
subcommand. 10