Background of K-means clustering

The algorithm operates by doing several iterations of the same basic process.

Each training instance is assigned to the closest cluster in relation to the specified distance function that is applied to the instance and cluster center. All cluster centers are then recalculated as the mean attribute value vectors of the instances that are assigned to particular clusters. The cluster centers are initialized by randomly picking k training instances, where k is the wanted number of clusters.

If the cluster assignments do not change at all, or if they have sufficiently few changes, the iterative process stops. In practice, however, it is sufficient to specify the number of iterations, typically a number in the range 3 - 36.

When you specify "distance=euclidean", the distance is measured by the Euclidean distance. When you specify "distance=norm_euclidean", the distance is measured by the Normalized Euclidean distance. The Normalized Euclidean distance is scale-invariant, that is, the result does not depend on the unity of measure that is used. Unities of measures are, for example, Mile versus km, $ versus €, and °F versus °C.