K-means clustering

The K-means algorithm is the most widely used clustering algorithm that uses an explicit distance measure to partition the data set into clusters.

The main concept of the K-means algorithm is to represent each cluster by the vector of mean attribute values of all training instances for numeric attributes and by the vector of modal (most frequent) values for nominal attributes that are assigned to that cluster. This cluster representation is called cluster center.

The following conditions apply to the cluster center:

  • The algorithm handles continuous attributes and nominal attributes.
  • You can handle the processes of cluster formation and cluster modeling in a computationally efficient way by applying the distance function to match instances against cluster centers.