K-Means Node Model Options

Model name. You can generate the model name automatically based on the target or ID field (or model type in cases where no such field is specified) or specify a custom name.

Use partitioned data. If a partition field is defined, this option ensures that data from only the training partition is used to build the model. 

Specified number of clusters. Specify the number of clusters to generate. The default is 5.

Generate distance field. If this option is selected, the model nugget will include a field containing the distance of each record from the center of its assigned cluster.

Cluster label. Specify the format for the values in the generated cluster membership field. Cluster membership can be indicated as a String with the specified Label prefix (for example "Cluster 1", "Cluster 2", and so on), or as a Number.

Note: If you want to include nominal (set) fields in your model but are having memory problems in building the model or the model is taking too long to build, consider recoding large set fields to reduce the number of values, or consider using a different field with fewer values as a proxy for the large set. For example, if you are having a problem with a product_id field containing values for individual products, you might consider removing it from the model and adding a less detailed product_category field instead.

Optimize. Select options designed to increase performance during model building based on your specific needs.

  • Select Speed to instruct the algorithm to never use disk spilling in order to improve performance.
  • Select Memory to instruct the algorithm to use disk spilling when appropriate at some sacrifice to speed. This option is selected by default.
    Note: When running in distributed mode, this setting can be overridden by administrator options specified in options.cfg.