Overview (CATREG command)

CATREG (categorical regression with optimal scaling using alternating least squares) quantifies categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. The variables can be given mixed optimal scaling levels, and no distributional assumptions about the variables are made.

Options

Transformation Type. You can specify the transformation type (spline ordinal, spline nominal, ordinal, nominal, or numerical) at which you want to analyze each variable.

Discretization. You can use the DISCRETIZATION subcommand to discretize fractional-value variables or to recode categorical variables.

Initial Configuration. You can specify the kind of initial configuration through the INITIAL subcommand. Also, multiple systematic starts or fixed signs for the regression coefficients can be specified through this subcommand.

Tuning the Algorithm. You can control the values of algorithm-tuning parameters with the MAXITER and CRITITER subcommands.

Regularized regression. You can specifiy one of three methods for regularized regression: Ridge regression, the Lasso, or the Elastic Net.

Resampling. You can specify cross validation or the .632 bootstrap for estimation of prediction error.

Missing Data. You can specify the treatment of missing data with the MISSING subcommand.

Optional Output. You can request optional output through the PRINT subcommand.

Transformation Plot per Variable. You can request a plot per variable of its quantification against the category numbers.

Residual Plot per Variable. You can request an overlay plot per variable of the residuals and the weighted quantification against the category numbers.

Ridge, Lasso, or Elastic Net plot. You can request a plot of the regularized coefficients paths. For the Elastic Net, the plots for all values of the Ridge penalty can be requested, or plots for specified values of the Ridge penalty.

Writing External Data. You can write the transformed data (category numbers replaced with optimal quantifications) to an outfile for use in further analyses. You can also write the discretized data to an outfile.

Saving Variables. You can save the transformed variables, the predicted values, and/or the residuals in the working data file.

Basic Specification

The basic specification is the command CATREG with the VARIABLES and ANALYSIS subcommands.

Syntax Rules

  • The VARIABLES and ANALYSIS subcommands must always appear, and the VARIABLES subcommand must be the first subcommand specified. The other subcommands, if specified, can be in any order.
  • Variables specified in the ANALYSIS subcommand must be found in the VARIABLES subcommand.
  • In the ANALYSIS subcommand, exactly one variable must be specified as a dependent variable and at least one variable must be specified as an independent variable after the keyword WITH.
  • The word WITH is reserved as a keyword in the CATREG procedure. Thus, it may not be a variable name in CATREG. Also, the word TO is a reserved word.

Operations

  • If a subcommand is specified more than once, the last one is executed but with a syntax warning. Note this is true also for the VARIABLES and ANALYSIS subcommands.

Limitations

  • If more than one dependent variable is specified in the ANALYSIS subcommand, CATREG is not executed.
  • CATREG operates on category indicator variables. The category indicators should be positive integers. You can use the DISCRETIZATION subcommand to convert fractional-value variables and string variables into positive integers. If DISCRETIZATION is not specified, fractional-value variables are automatically converted into positive integers by grouping them into seven categories with a close to normal distribution and string variables are automatically converted into positive integers by ranking.
  • In addition to system missing values and user defined missing values, CATREG treats category indicator values less than 1 as missing. If one of the values of a categorical variable has been coded 0 or some negative value and you want to treat it as a valid category, use the COMPUTE command to add a constant to the values of that variable such that the lowest value will be 1. You can also use the RANKING option of the DISCRETIZATION subcommand for this purpose, except for variables you want to treat as numerical, since the characteristic of equal intervals in the data will not be maintained.
  • There must be at least three valid cases.
  • The number of valid cases must be greater than the number of independent variables plus 1.
  • The maximum number of independent variables is 200.
  • Split-File has no implications for CATREG.