Correspondence analysis

A correspondence table is any two-way table whose cells contain some measurement of correspondence between the rows and the columns. The measure of correspondence can be any indication of the similarity, affinity, confusion, association, or interaction between the row and column variables. A very common type of correspondence table is a crosstabulation, where the cells contain frequency counts.

Such tables can be obtained easily with the Crosstabs procedure. However, a crosstabulation does not always provide a clear picture of the nature of the relationship between the two variables. This is particularly true if the variables of interest are nominal (with no inherent order or rank) and contain numerous categories. Crosstabulation may tell you that the observed cell frequencies differ significantly from the expected values in a 10 x 9 crosstabulation of occupation and breakfast cereal, but it may be difficult to discern which occupational groups have similar tastes or what those tastes are.

Correspondence Analysis allows you to examine the relationship between two nominal variables graphically in a multidimensional space. It computes row and column scores and produces plots based on the scores. Categories that are similar to each other appear close to each other in the plots. In this way, it is easy to see which categories of a variable are similar to each other or which categories of the two variables are related. The Correspondence Analysis procedure also allows you to fit supplementary points into the space defined by the active points.

If the ordering of the categories according to their scores is undesirable or counterintuitive, order restrictions can be imposed by constraining the scores for some categories to be equal. For example, suppose that you expect the variable smoking behavior, with categories none, light, medium, and heavy, to have scores that correspond to this ordering. However, if the analysis orders the categories none, light, heavy, and medium, constraining the scores for heavy and medium to be equal preserves the ordering of the categories in their scores.

The interpretation of correspondence analysis in terms of distances depends on the normalization method used. The Correspondence Analysis procedure can be used to analyze either the differences between categories of a variable or the differences between variables. With the default normalization, it analyzes the differences between the row and column variables.

The correspondence analysis algorithm is capable of many kinds of analyses. Centering the rows and columns and using chi-square distances corresponds to standard correspondence analysis. However, using alternative centering options combined with Euclidean distances allows for an alternative representation of a matrix in a low-dimensional space.

The following example employs a relatively small correspondence table and illustrates the concepts inherent in correspondence analysis.

Next