IBM Support

Hierarchical Clustering is sensitive to order of cases in the data file

Troubleshooting


Problem

I used the SPSS CLUSTER procedure to perform a hierarchical cluster of cases, saving cluster memberships for a 4-cluster solution. I later sorted the file by an unrelated variable. I reran the CLUSTER procedure, saving the memberships for a 4-cluster solution to a new variable. I used the CROSSTABS procedure to check whether cluster groupings were identical across CLUSTER runs (although the membership number assigned to each cluster could be expected to change). Some cases were assigned to different clusters in the two cluster analyses. I thought that the procedure examined the entire distance matrix before selecting clusters to join at each step. Why does the hierarchical clustering result seem to be dependent on the order of the cases in the file?

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB76","label":"Data Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Historical Number

20832

Document Information

Modified date:
16 April 2020

UID

swg21476880