K-Means-AS node Build Options
Use the Build Options tab to specify build options for the K-Means-AS node, including regular options for model building, initialization options for initializing cluster centers, and advanced options for the computing iteration and random seed. For more information, see the JavaDoc for K-Means on SparkML.1
Regular
Model Name. The name of the field generated after scoring to a specific cluster. Select Auto (default) or select Custom and type a name.
Number of Clusters. Specify the number of clusters to generate. The default is 5 and the minimum is 2.
Initialization
Initialization Mode. Specify the method for initializing the cluster centers. K-Means|| is the default. For details about these two methods, see Scalable K-Means++. 2
Initialization Steps. If the K-Means|| initialization mode is selected, specify the number of initialization steps. 2 is the default.
Advanced
Advanced Settings. Select this option if you want to set advanced options as follows.
Max Iteration. Specify the maximum number of iterations to perform when searching cluster centers. 20 is the default.
Tolerance. Specify the convergence tolerance for iterative algorithms. 1.0E-4 is the default.
Set Random Seed. Select this option and click Generate to generate the seed used by the random number generator.
Display
Display Graph. Select this option if you want a graph to be included in the output.
SPSS Modeler setting | Script name (property name) | K-Means SparkML parameter |
---|---|---|
Input Fields | features |
|
Number of Clusters | clustersNum |
k |
Initialization Mode | initMode |
initMode |
Initialization Steps | initSteps |
initSteps |
Max Iteration | maxIter |
maxIter |
Toleration | toleration |
tol |
Random Seed | randomSeed |
seed |
1 "Class KMeans." Apache Spark. JavaDoc. Web. 3 Oct 2017.
2 Bahmani, Moseley, et al. "Scalable K-Means++." Feb 28, 2012. http://theory.stanford.edu/%7Esergei/papers/vldb12-kmpar.pdf.