K-Means-AS node

K-Means is one of the most commonly used clustering algorithms. It clusters data points into a predefined number of clusters.¹ The K-Means-AS node in SPSS® Modeler is implemented in Spark.

For details about K-Means algorithms, see https://spark.apache.org/docs/2.2.0/ml-clustering.html.

Note that the K-Means-AS node performs one-hot encoding automatically for categorical variables.

¹ "Clustering." Apache Spark. MLlib: Main Guide. Web. 3 Oct 2017.