Format of the Matrix Data File (CLUSTER command)

  • The matrix data file can include three special variables created by the program: ROWTYPE_, ID, and VARNAME_.
  • The variable ROWTYPE_ is a string variable with the value PROX (for proximity measure). PROX is assigned value labels containing the distance measure used to create the matrix and either SIMILARITY or DISSIMILARITY as an identifier. The variable VARNAME_ is a short string variable whose values are the names of the new variables. The variable CASENO_ is a numeric variable with values equal to the original case numbers.
  • ID is included only when an identifying variable is not specified on the ID subcommand. ID is a short string and takes the value CASE m, where m is the actual number of each case. Note that m may not be consecutive if cases have been selected.
  • If an identifying variable is specified on the ID subcommand, it takes the place of ID between ROWTYPE_ and VARNAME_. Up to 20 characters can be displayed for the identifying variable.
  • VARNAME_ is a string variable that takes the values VAR1, VAR2, ..., VARn to correspond to the names of the distance variables in the matrix (VAR1, VAR2, ..., VARn, where n is the number of cases in the largest split file). The numeric suffix for the variable names is consecutive and may not be the same as the actual case number.
  • The remaining variables in the matrix file are the distance variables used to form the matrix. The distance variables are assigned variable labels in the form of CASE m to identify the actual number of each case.