Functions for K-means clustering

The K-means algorithm is implemented in the KMEANS stored procedure and the PREDICT_KMEANS stored procedure. To print a K-means model, use the PRINT_MODEL procedure.

The KMEANS stored procedure and the PREDICT_KMEANS stored procedure have the following functions:

  • Support for continuous attributes and discrete attributes for the following calculation and representation:
    Difference-based calculation
    The difference between discrete values is assumed to be 0 if the values are equal. If the values are not equal, the difference is assumed to be 1.
    Cluster center representation
    Modes are used for discrete attributes instead of means. Modes are the most frequent values.
  • Distance functions Normalized Euclidean ("distance=norm_euclidean") and Euclidean ("distance=euclidean")

    Euclidean is the default.

  • Stop criterion based on convergence or after a specified maximum number of iterations
  • Cluster membership prediction for new data
Note: Rows from an input table that contains NULL values are ignored.

All stored procedures consist of a mandatory one-string parameter that contains pairs of <parameter>=<value> entries. These entries are separated by a comma. The data type of the parameter is VARCHAR(any).

Valid <parameter>=<value> entries are listed in the parameter descriptions for each stored procedure.