# Categorical Principal Components Analysis (CATPCA)

This procedure simultaneously quantifies categorical variables
while reducing the dimensionality of the data. Categorical principal
components analysis is also known by the acronym CATPCA, for *cat*egorical
principal components analysis.

The goal of principal components analysis is to reduce an original set of variables into a smaller set of uncorrelated components that represent most of the information found in the original variables. The technique is most useful when a large number of variables prohibits effective interpretation of the relationships between objects (subjects and units). By reducing the dimensionality, you interpret a few components rather than a large number of variables.

Standard principal components analysis assumes linear relationships between numeric variables. On the other hand, the optimal-scaling approach allows variables to be scaled at different levels. Categorical variables are optimally quantified in the specified dimensionality. As a result, nonlinear relationships between variables can be modeled.

**Example.** Categorical principal components analysis could
be used to graphically display the relationship between job category,
job division, region, amount of travel (high, medium, and low), and
job satisfaction. You might find that two dimensions account for a
large amount of variance. The first dimension might separate job category
from region, whereas the second dimension might separate job division
from amount of travel. You also might find that high job satisfaction
is related to a medium amount of travel.

**Statistics and plots.** Frequencies, missing values, optimal
scaling level, mode, variance accounted for by centroid coordinates,
vector coordinates, total per variable and per dimension, component
loadings for vector-quantified variables, category quantifications
and coordinates, iteration history, correlations of the transformed
variables and eigenvalues of the correlation matrix, correlations
of the original variables and eigenvalues of the correlation matrix,
object scores, category plots, joint category plots, transformation
plots, residual plots, projected centroid plots, object plots, biplots,
triplots, and component loadings plots.

Categorical Principal Components Analysis Data Considerations

**Data.** String variable values are always converted into
positive integers by ascending alphanumeric order. User-defined missing
values, system-missing values, and values less than 1 are considered
missing; you can recode or add a constant to variables with values
less than 1 to make them nonmissing.

**Assumptions.** The data must contain at least three valid
cases. The analysis is based on positive integer data. The discretization
option will automatically categorize a fractional-valued variable
by grouping its values into categories with a close to "normal" distribution
and will automatically convert values of string variables into positive
integers. You can specify other discretization schemes.

**Related procedures.** Scaling all variables at the numeric
level corresponds to standard principal components analysis. Alternate
plotting features are available by using the transformed variables
in a standard linear principal components analysis. If all variables
have multiple nominal scaling levels, categorical principal components
analysis is identical to multiple correspondence analysis. If sets
of variables are of interest, categorical (nonlinear) canonical correlation
analysis should be used.

To Obtain a Categorical Principal Components Analysis

This feature requires the Categories option.

- From the menus choose:
- Select Some variable(s) not multiple nominal.
- Select One set.
- Click Define.
- Select at least two analysis variables and specify the number of dimensions in the solution.
- Click OK.

You may optionally specify supplementary variables, which are fitted into the solution found, or labeling variables for the plots.

This procedure pastes CATPCA command syntax.