IBM Support

How is the log likelihood distance method applied in TwoStep Cluster Analysis?

Question & Answer


Question

The Distance Measure options on the TwoStep Cluster Analysis dialog box allows for either the Log-likelihood (default) or Euclidean distance as the distance measure. How are log likelihood distance measure applied in TwoStep step Cluster Analysis?

Answer

The TwoStep Clustering algorithm using the log likelihood measure computes probabilities of cluster memberships based on one or more probability distributions. The goal of the clustering algorithm then is to maximize the overall probability or likelihood of the data, given the (final) clusters. However, since optimization of such a function is a practical impossibility, a heuristic method is used that does not guarantee that the likelihood will be maximized. The results can depend upon the order of the cases in the file, and for that reason, randomly sorting the data prior to running the procedure is recommended.

Unlike the classical implementation of k-means clustering, the general TwoStep algorithm can be applied to both continuous and categorical variables (note that the classical k-means algorithm can also be modified to accommodate categorical variables). The log-likelihood distance measure used in TwoStep Cluster assumes that continuous variables are normally distributed and that categorical variables are distributed according to multinomial distributions. All variables are assumed to be independent, as are cases. While independence among cases is often quite reasonable to assume, most applications involve variables that are somewhat related, so the algorithms used in TwoStep Cluster again are at best approximations to reality.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

60130

Document Information

Modified date:
16 April 2020

UID

swg21479407