# Categorical Regression (CATREG)

**Categorical regression** quantifies categorical data by assigning
numerical values to the categories, resulting in an optimal linear
regression equation for the transformed variables. Categorical regression
is also known by the acronym CATREG, for *cat*egorical *reg*ression.

Standard linear regression analysis involves minimizing the sum of squared differences between a response (dependent) variable and a weighted combination of predictor (independent) variables. Variables are typically quantitative, with (nominal) categorical data recoded to binary or contrast variables. As a result, categorical variables serve to separate groups of cases, and the technique estimates separate sets of parameters for each group. The estimated coefficients reflect how changes in the predictors affect the response. Prediction of the response is possible for any combination of predictor values.

An alternative approach involves regressing the response on the categorical predictor values themselves. Consequently, one coefficient is estimated for each variable. However, for categorical variables, the category values are arbitrary. Coding the categories in different ways yield different coefficients, making comparisons across analyses of the same variables difficult.

CATREG extends the standard approach by simultaneously scaling nominal, ordinal, and numerical variables. The procedure quantifies categorical variables so that the quantifications reflect characteristics of the original categories. The procedure treats quantified categorical variables in the same way as numerical variables. Using nonlinear transformations allow variables to be analyzed at a variety of levels to find the best-fitting model.

**Example.** Categorical regression could be used to describe
how job satisfaction depends on job category, geographic region, and
amount of travel. You might find that high levels of satisfaction
correspond to managers and low travel. The resulting regression equation
could be used to predict job satisfaction for any combination of the
three independent variables.

**Statistics and plots.** Frequencies, regression coefficients,
ANOVA table, iteration history, category quantifications, correlations
between untransformed predictors, correlations between transformed
predictors, residual plots, and transformation plots.

Categorical Regression Data Considerations

**Data.** CATREG operates on category indicator variables.
The category indicators should be positive integers. You can use the
Discretization dialog box to convert fractional-value variables and
string variables into positive integers.

**Assumptions.** Only one response variable is allowed, but
the maximum number of predictor variables is 200. The data must contain
at least three valid cases, and the number of valid cases must exceed
the number of predictor variables plus one.

**Related procedures.** CATREG is equivalent to categorical
canonical correlation analysis with optimal scaling (OVERALS) with
two sets, one of which contains only one variable. Scaling all variables
at the numerical level corresponds to standard multiple regression
analysis.

To Obtain a Categorical Regression

This feature requires the Categories option.

- From the menus choose:
- Select the dependent variable and independent variable(s).
- Click OK.

Optionally, change the scaling level for each variable.

This procedure pastes CATREG command syntax.