IBM Support

K-Means Cluster (QUICK CLUSTER) results sensitive to case order

Troubleshooting


Problem

The SPSS K-Means Cluster procedure (QUICK CLUSTER command) appears to be very sensitive to case order. I had run the procedure and saved cluster memberships in a current data file. I then sorted the data by an unrelated variable and reran the K-Means analysis to see if the clusters were affected. The two cluster solutions were very different. This was not just a reassignment of cluster numbers to the same groups of cases. Many pairs of cases that shared membership in a cluster in the first solution were in separate clusters in the second solution. The final cluster centroids and sample sizes were very discrepant between the two solutions. Does this result reflect a bug in the program? If the procedure is so sensitive to case order, how can I be sure of getting the correct cluster solution?

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB76","label":"Data Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Historical Number

20829

Document Information

Modified date:
16 April 2020

UID

swg21476878