Overview (CATREG command)
CATREG
(categorical
regression with optimal scaling using alternating least squares) quantifies
categorical variables using optimal scaling, resulting in an optimal
linear regression equation for the transformed variables. The variables
can be given mixed optimal scaling levels, and no distributional assumptions
about the variables are made.
Options
Transformation Type. You can specify the transformation type (spline ordinal, spline nominal, ordinal, nominal, or numerical) at which you want to analyze each variable.
Discretization. You can use the DISCRETIZATION
subcommand to discretize fractional-value variables or to recode
categorical variables.
Initial Configuration. You can specify
the kind of initial configuration through the INITIAL
subcommand. Also, multiple systematic starts
or fixed signs for the regression coefficients can be specified through
this subcommand.
Tuning the Algorithm. You can control the values of algorithm-tuning
parameters with the MAXITER
and CRITITER
subcommands.
Regularized regression. You can specifiy one of three methods for regularized regression: Ridge regression, the Lasso, or the Elastic Net.
Resampling. You can specify cross validation or the .632 bootstrap for estimation of prediction error.
Missing Data. You can specify
the treatment of missing data with the MISSING
subcommand.
Optional Output. You can request optional output through
the PRINT
subcommand.
Transformation Plot per Variable. You can request a plot per variable of its quantification against the category numbers.
Residual Plot per Variable. You can request an overlay plot per variable of the residuals and the weighted quantification against the category numbers.
Ridge, Lasso, or Elastic Net plot. You can request a plot of the regularized coefficients paths. For the Elastic Net, the plots for all values of the Ridge penalty can be requested, or plots for specified values of the Ridge penalty.
Writing External Data. You can write the transformed data (category numbers replaced with optimal quantifications) to an outfile for use in further analyses. You can also write the discretized data to an outfile.
Saving Variables. You can save the transformed variables, the predicted values, and/or the residuals in the working data file.
Basic Specification
The basic
specification is the command CATREG
with the VARIABLES
and ANALYSIS
subcommands.
Syntax Rules
- The
VARIABLES
andANALYSIS
subcommands must always appear, and theVARIABLES
subcommand must be the first subcommand specified. The other subcommands, if specified, can be in any order. - Variables specified in the
ANALYSIS
subcommand must be found in theVARIABLES
subcommand. - In the
ANALYSIS
subcommand, exactly one variable must be specified as a dependent variable and at least one variable must be specified as an independent variable after the keywordWITH
. - The word
WITH
is reserved as a keyword in theCATREG
procedure. Thus, it may not be a variable name inCATREG
. Also, the wordTO
is a reserved word.
Operations
- If a subcommand is specified
more than once, the last one is executed but with a syntax warning.
Note this is true also for the
VARIABLES
andANALYSIS
subcommands.
Limitations
- If more than
one dependent variable is specified in the
ANALYSIS
subcommand,CATREG
is not executed. -
CATREG
operates on category indicator variables. The category indicators should be positive integers. You can use theDISCRETIZATION
subcommand to convert fractional-value variables and string variables into positive integers. IfDISCRETIZATION
is not specified, fractional-value variables are automatically converted into positive integers by grouping them into seven categories with a close to normal distribution and string variables are automatically converted into positive integers by ranking. - In addition to system missing values
and user defined missing values,
CATREG
treats category indicator values less than 1 as missing. If one of the values of a categorical variable has been coded 0 or some negative value and you want to treat it as a valid category, use theCOMPUTE
command to add a constant to the values of that variable such that the lowest value will be 1. You can also use theRANKING
option of theDISCRETIZATION
subcommand for this purpose, except for variables you want to treat as numerical, since the characteristic of equal intervals in the data will not be maintained. - There must be at least three valid cases.
- The number of valid cases must be greater than the number of independent variables plus 1.
- The maximum number of independent variables is 200.
- Split-File has no implications for
CATREG
.