Functions for K-means clustering
The K-means algorithm is implemented in the KMEANS stored procedure and the PREDICT_KMEANS stored procedure. To print a K-means model, use the PRINT_MODEL procedure.
The KMEANS stored procedure and the PREDICT_KMEANS stored procedure have the following functions:
- Support for continuous attributes and discrete attributes for the following calculation and representation:
- Difference-based calculation
- The difference between discrete values is assumed to be 0 if the values are equal. If the values are not equal, the difference is assumed to be 1.
- Cluster center representation
- Modes are used for discrete attributes instead of means. Modes are the most frequent values.
- Distance functions Normalized Euclidean ("distance=norm_euclidean") and Euclidean
("distance=euclidean")
Euclidean is the default.
- Stop criterion based on convergence or after a specified maximum number of iterations
- Cluster membership prediction for new data
Note: Rows from an input table that contains NULL values are ignored.
All stored procedures consist of a mandatory one-string parameter that contains pairs of <parameter>=<value> entries. These entries are separated by a comma. The data type of the parameter is VARCHAR(any).
Valid <parameter>=<value> entries are listed in the parameter descriptions for each stored procedure.