IBM Support

Reading correlation matrix text for factor analysis in SPSS

Troubleshooting


Problem

I have a correlation matrix that is stored in a text file. I would like to analyze this matrix with the SPSS Factor Analysis procedure (FACTOR). If I read the file into SPSS with the Text Import Wizard in the Data Editor, then the Factor Analysis procedure seems to treat the matrix as if it was case-level data. How can I have the correlation matrix recognized as such by the Factor Analysis procedure?

Resolving The Problem

You need to identify the data as matrix data when you read it into the SPSS Data Editor. The first variable in the SPSS matrix file is called ROWTYPE_ and identifies the content in each row of the file (CORR, for correlations, for example). The second variable is called VARNAME_ and contains the variable name corresponding to each row of the matrix. The FACTOR procedure also expects a row of sample size (N) values to precede the correlation matrix rows.

The FACTOR procedure must also be informed that the data contains a correlation matrix or it will treat the data as case-level data. There is no option for matrix input in the dialog boxes for the Factor Analysis procedure, so the procedure must be run by the FACTOR syntax command. The FACTOR subcommand

/MATRIX = IN(COR = *)

informs the procedure that the input data contains a correlation matrix. The '*' in the parentheses indicates that this data is the current active data. You can replace the * with the name of an existing SPSS matrix data file that is not in the data editor. However, there must be some data in the Data Editor to run the FACTOR command, even if you are naming a different file to analyze.

When your correlation matrix is in a text file, the easiest way to have SPSS read it in a usable way is to open or copy the file to an SPSS syntax window and add the SPSS commands. Precede the correlation matrix with a MATRIX DATA command. Also, place the data within BEGIN DATA and END DATA commands. In the MATRIX data command, you just need to specify the names of the variables and note that the contents are correlations. The value(s) for N are added in the /N subcommand if the number of cases is equal for all variables (see Example 1 below) or by a matrix of Ns when the correlations were computed with pairwise deletion (see Example 2 below).. When you run the commands, SPSS will place the row(s) of N values in the proper location. MATRIX DATA expects that a correlation matrix will be lower triangular by default, although you can specify a full or upper triangular matrix with a /FORMAT subcommand. The diagonal values of the matrix are also expected - if they're missing, add the keyword NODIAG to the /FORMAT subcommand.
Note that the VARIABLE list is not included in the FACTOR command when the /MATRIX IN subcommand is used. If you want to analyze a subset of the variables in the matrix, specify these in an /ANALYZE subcommand in FACTOR.
Here are some example command sets to read matrix data into SPSS and analyze it with FACTOR.

Example 1: This example includes a single value for the sample size, as specified by the /N subcommand in the MATRIX DATA command. Use of this subcommand implies that the Ns are the same for all variables, either due to full data or listwise deletion in creation of the correlation matrix. In listwise deletion, only cases with valid data on all of the analysis variables are included in calculation of each correlation.

MATRIX DATA VARIABLES = y1 TO y4 x1 TO x4
/N= 200
/CONTENTS = CORR .
BEGIN DATA.
1
0.484 1
0.464 0.425 1
0.598 0.484 0.598 1
0.461 0.347 0.585 0.825 1
0.655 0.323 0.56 0.683 0.655 1
0.343 -0.006 0.253 0.543 0.47 0.438 1
0.551 0.217 0.516 0.679 0.567 0.558 0.518 1
END DATA.
EXECUTE.
FACTOR
/MATRIX = IN (COR = *)
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT
/CRITERIA MINEIGEN(.5) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25) DELTA(0)
/ROTATION OBLIMIN
/METHOD=CORRELATION .


Example 2: This example reads a correlation matrix that was computed with pairwise deletion of missing values, along with the matrix of pairwise Ns. In pairwise deletion, each correlation is calculated with the cases that have valid data on that pair of variables, without regard to whether those cases have missing data on other variables in the analysis. To enter a matrix of pairwise Ns, omit the /N subcommand, add N_MATRIX to the /CONTENTS subcommand, and specify a matrix of N's as in the example below.

* Example of correlation matrix input for factor with pairwise Ns .

MATRIX DATA VARIABLES = x1 TO x5
/CONTENTS = N_MATRIX CORR .
BEGIN DATA.
99
96 97
98 96 99
95 95 96 96
97 95 97 94 98
1
.720 1
.638 .560 1
.628 .644 .548 1
.639 .652 .641 .661 1
END DATA.
EXECUTE.

FACTOR
/MATRIX = IN (COR = *)
/MISSING PAIRWISE
/PRINT INITIAL EXTRACTION ROTATION repr kmo
/FORMAT SORT
/CRITERIA MINEIGEN(1.0) ITERATE(100)
/EXTRACTION PAF
/CRITERIA ITERATE(25) DELTA(0)
/ROTATION varimax
/METHOD=CORRELATION .

Note that a /MISSING PAIRWISE subcommand was added to the FACTOR command. The default missing value handling for Factor is listwise deletion of missing values. However, with N_MATRIX in the /Contents subcommand of MATRIX DATA and with a pairwise N matrix included, Factor will recognize that the data is based on pairwise deletion. If the /MISSING PAIRWISE subcommand is missing from the Factor command in these circumstances, the Factor procedure will print a warning to alert you that pairwise deletion is assumed.

You can enter the matrix data directly into the Data Editor in SPSS. The key elements are that the first two variables must be rowtype_ and varname_ (in the absence of a split file variable, as explained below). The remaining variables will be the names of variables in the analysis. Be sure to include the underscore character. (A warning will appear that variable names with underscores have special functions, but that is appropriate for rowtype_ and varname_ - click OK for each.) Make rowtype_ a string variable of width 8. Make varname_ a string variable with sufficient width to hold your variable names.
The Factor procedure requires a row of Ns (for a matrix based on listwise N's, i.e., with all variables based on the same number of cases. (For a matrix based on pairwise N;s, a matrix of N's is required, as shown in Example 2 for MATRIX DATA commands above.)
Place 'N' (without quotes) in the rowtype_ column in this row, leave varname_ blank, and type the N value into each of the analysis variable columns. For the rows with the correlation matrix, type 'CORR' into the rowtype_ variable. There will be a row for each variable in the correlation matrix. The
value of varname_ must be identical to the corresponding variable (column) in the matrix. You must enter the full symmetric matrix when entering it directly in the Data Editor.
If you wanted to analyze the covariance matrix, you would need an additional row of standard deviations, with 'STDDEV' in the rowtype_ variable. If you wished to employ a split file variable, then that variable would precede rowtype_ and there would set of N and CORR rows for each split file group.

Note that if you enter the matrix data directly into the Data Viewer, you still need to run Factor from a syntax command so the /MATRIX subcommand can be used.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

30282

Document Information

Modified date:
16 April 2020

UID

swg21479694