# Nonlinear Canonical Correlation Analysis (OVERALS)

Nonlinear canonical correlation analysis corresponds to categorical canonical correlation analysis with optimal scaling. The purpose of this procedure is to determine how similar sets of categorical variables are to one another. Nonlinear canonical correlation analysis is also known by the acronym OVERALS.

Standard canonical correlation analysis is an extension of multiple regression, where the second set does not contain a single response variable but instead contain multiple response variables. The goal is to explain as much as possible of the variance in the relationships among two sets of numerical variables in a low dimensional space. Initially, the variables in each set are linearly combined such that the linear combinations have a maximal correlation. Given these combinations, subsequent linear combinations are determined that are uncorrelated with the previous combinations and that have the largest correlation possible.

The optimal scaling approach expands the standard analysis in three crucial ways. First, OVERALS allows more than two sets of variables. Second, variables can be scaled as either nominal, ordinal, or numerical. As a result, nonlinear relationships between variables can be analyzed. Finally, instead of maximizing correlations between the variable sets, the sets are compared to an unknown compromise set that is defined by the object scores.

**Example.** Categorical canonical correlation analysis with
optimal scaling could be used to graphically display the relationship
between one set of variables containing job category and years of
education and another set of variables containing region of residence
and gender. You might find that years of education and region of residence
discriminate better than the remaining variables. You might also find
that years of education discriminates best on the first dimension.

**Statistics and plots.** Frequencies, centroids, iteration
history, object scores, category quantifications, weights, component
loadings, single and multiple fit, object scores plots, category coordinates
plots, component loadings plots, category centroids plots, transformation
plots.

Nonlinear Canonical Correlation Analysis Data Considerations

**Data.** Use integers to code categorical variables (nominal
or ordinal scaling level). To minimize output, use consecutive integers
beginning with 1 to code each variable. Variables that are scaled
at the numerical level should not be recoded to consecutive integers.
To minimize output, for each variable that is scaled at the numerical
level, subtract the smallest observed value from every value and add
1. Fractional values are truncated after the decimal.

**Assumptions.** Variables can be classified into two or more
sets. Variables in the analysis are scaled as multiple nominal, single
nominal, ordinal, or numerical. The maximum number of dimensions that
are used in the procedure depends on the optimal scaling level of
the variables. If all variables are specified as ordinal, single nominal,
or numerical, the maximum number of dimensions is the lesser of the
following two values: the number of observations minus 1 or the total
number of variables. However, if only two sets of variables are defined,
the maximum number of dimensions is the number of variables in the
smaller set. If some variables are multiple nominal, the maximum number
of dimensions is the total number of multiple nominal categories plus
the number of nonmultiple nominal variables minus the number of multiple
nominal variables. For example, if the analysis involves five variables,
one of which is multiple nominal with four categories, the maximum
number of dimensions is (4 + 4 – 1), or 7. If you specify a number
that is greater than the maximum, the maximum value is used.

**Related procedures.** If each set contains one variable,
nonlinear canonical correlation analysis is equivalent to principal
components analysis with optimal scaling. If each of these variables
is multiple nominal, the analysis corresponds to multiple correspondence
analysis. If two sets of variables are involved, and one of the sets
contains only one variable, the analysis is identical to categorical
regression with optimal scaling.

To Obtain a Nonlinear Canonical Correlation Analysis

This feature requires the Categories option.

- From the menus choose:
- Select either All variables multiple nominal or Some variable(s) not multiple nominal.
- Select Multiple sets.
- Click Define.
- Define at least two sets of variables. Select the variable(s) that you want to include in the first set. To move to the next set, click Next, and select the variables that you want to include in the second set. You can add additional sets. Click Previous to return to the previously defined variable set.
- Define the value range and measurement scale (optimal scaling level) for each selected variable.
- Click OK.
- Optionally:

- Select one or more variables to provide point labels for object scores plots. Each variable produces a separate plot, with the points labeled by the values of that variable. You must define a range for each of these plot label variables. When you are using the dialog box, a single variable cannot be used both in the analysis and as a labeling variable. If you want to label the object scores plot with a variable that is used in the analysis, use the Compute facility (available from the Transform menu) to create a copy of that variable. Use the new variable to label the plot. Alternatively, command syntax can be used.
- Specify the number of dimensions that you want in the solution. In general, choose as few dimensions as needed to explain most of the variation. If the analysis involves more than two dimensions, three-dimensional plots of the first three dimensions are produced. Other dimensions can be displayed by editing the chart.

This procedure pastes OVERALS command syntax.