Nonlinear Canonical Correlation Analysis (OVERALS)

Nonlinear canonical correlation analysis corresponds to categorical canonical correlation analysis with optimal scaling. The purpose of this procedure is to determine how similar sets of categorical variables are to one another. Nonlinear canonical correlation analysis is also known by the acronym OVERALS.

Standard canonical correlation analysis is an extension of multiple regression, where the second set does not contain a single response variable but instead contain multiple response variables. The goal is to explain as much as possible of the variance in the relationships among two sets of numerical variables in a low dimensional space. Initially, the variables in each set are linearly combined such that the linear combinations have a maximal correlation. Given these combinations, subsequent linear combinations are determined that are uncorrelated with the previous combinations and that have the largest correlation possible.

The optimal scaling approach expands the standard analysis in three crucial ways. First, OVERALS allows more than two sets of variables. Second, variables can be scaled as either nominal, ordinal, or numerical. As a result, nonlinear relationships between variables can be analyzed. Finally, instead of maximizing correlations between the variable sets, the sets are compared to an unknown compromise set that is defined by the object scores.

Example. Categorical canonical correlation analysis with optimal scaling could be used to graphically display the relationship between one set of variables containing job category and years of education and another set of variables containing region of residence and gender. You might find that years of education and region of residence discriminate better than the remaining variables. You might also find that years of education discriminates best on the first dimension.

Statistics and plots. Frequencies, centroids, iteration history, object scores, category quantifications, weights, component loadings, single and multiple fit, object scores plots, category coordinates plots, component loadings plots, category centroids plots, transformation plots.

Nonlinear Canonical Correlation Analysis Data Considerations

Data. Use integers to code categorical variables (nominal or ordinal scaling level). To minimize output, use consecutive integers beginning with 1 to code each variable. Variables that are scaled at the numerical level should not be recoded to consecutive integers. To minimize output, for each variable that is scaled at the numerical level, subtract the smallest observed value from every value and add 1. Fractional values are truncated after the decimal.

Assumptions. Variables can be classified into two or more sets. Variables in the analysis are scaled as multiple nominal, single nominal, ordinal, or numerical. The maximum number of dimensions that are used in the procedure depends on the optimal scaling level of the variables. If all variables are specified as ordinal, single nominal, or numerical, the maximum number of dimensions is the lesser of the following two values: the number of observations minus 1 or the total number of variables. However, if only two sets of variables are defined, the maximum number of dimensions is the number of variables in the smaller set. If some variables are multiple nominal, the maximum number of dimensions is the total number of multiple nominal categories plus the number of nonmultiple nominal variables minus the number of multiple nominal variables. For example, if the analysis involves five variables, one of which is multiple nominal with four categories, the maximum number of dimensions is (4 + 4 – 1), or 7. If you specify a number that is greater than the maximum, the maximum value is used.

Related procedures. If each set contains one variable, nonlinear canonical correlation analysis is equivalent to principal components analysis with optimal scaling. If each of these variables is multiple nominal, the analysis corresponds to multiple correspondence analysis. If two sets of variables are involved, and one of the sets contains only one variable, the analysis is identical to categorical regression with optimal scaling.

To Obtain a Nonlinear Canonical Correlation Analysis

This feature requires the Categories option.

  1. From the menus choose:

    Analyze > Dimension Reduction > Optimal Scaling...

  2. Select either All variables multiple nominal or Some variable(s) not multiple nominal.
  3. Select Multiple sets.
  4. Click Define.
  5. Define at least two sets of variables. Select the variable(s) that you want to include in the first set. To move to the next set, click Next, and select the variables that you want to include in the second set. You can add additional sets. Click Previous to return to the previously defined variable set.
  6. Define the value range and measurement scale (optimal scaling level) for each selected variable.
  7. Click OK.
  8. Optionally:
  • Select one or more variables to provide point labels for object scores plots. Each variable produces a separate plot, with the points labeled by the values of that variable. You must define a range for each of these plot label variables. When you are using the dialog box, a single variable cannot be used both in the analysis and as a labeling variable. If you want to label the object scores plot with a variable that is used in the analysis, use the Compute facility (available from the Transform menu) to create a copy of that variable. Use the new variable to label the plot. Alternatively, command syntax can be used.
  • Specify the number of dimensions that you want in the solution. In general, choose as few dimensions as needed to explain most of the variation. If the analysis involves more than two dimensions, three-dimensional plots of the first three dimensions are produced. Other dimensions can be displayed by editing the chart.

This procedure pastes OVERALS command syntax.