autoclusternode properties

Auto Cluster node iconThe Auto Cluster node estimates and compares clustering models, which identify groups of records that have similar characteristics. The node works in the same manner as other automated modeling nodes, allowing you to experiment with multiple combinations of options in a single modeling pass. Models can be compared using basic measures with which to attempt to filter and rank the usefulness of the cluster models, and provide a measure based on the importance of particular fields.

Example

node = stream.create("autocluster", "My node")
node.setPropertyValue("ranking_measure", "Silhouette")
node.setPropertyValue("ranking_dataset", "Training")
node.setPropertyValue("enable_silhouette_limit", True)
node.setPropertyValue("silhouette_limit", 5)
Table 1. autoclusternode properties
autoclusternode Properties Values Property description
evaluation field
Note: Auto Cluster node only. Identifies the field for which an importance value will be calculated. Alternatively, can be used to identify how well the cluster differentiates the value of this field and, therefore, how well the model will predict this field.
ranking_measure Silhouette Num_clusters Size_smallest_cluster Size_largest_cluster Smallest_to_largest Importance  
ranking_dataset Training Test  
summary_limit integer Number of models to list in the report. Specify an integer between 1 and 100.
enable_silhouette_limit flag  
silhouette_limit integer Integer between 0 and 100.
enable_number_less_limit flag  
number_less_limit number Real number between 0.0 and 1.0.
enable_number_greater_limit flag  
number_greater_limit number Integer greater than 0.
enable_smallest_cluster_limit flag  
smallest_cluster_units Percentage Counts  
smallest_cluster_limit_percentage number  
smallest_cluster_limit_count integer Integer greater than 0.
enable_largest_cluster_limit flag  
largest_cluster_units Percentage Counts  
largest_cluster_limit_percentage number  
largest_cluster_limit_count integer  
enable_smallest_largest_limit flag  
smallest_largest_limit number  
enable_importance_limit flag  
importance_limit_condition Greater_than Less_than  
importance_limit_greater_than number Integer between 0 and 100.
importance_limit_less_than number Integer between 0 and 100.
<algorithm> flag Enables or disables the use of a specific algorithm.
<algorithm>.<property> string Sets a property value for a specific algorithm. See Setting algorithm properties for more information.
number_of_models integer
enable_model_build_time_limit boolean (K-Means, Kohonen, TwoStep, SVM, KNN, Bayes Net and Decision List models only.)
Sets a maximum time limit for any one model. For example, if a particular model requires an unexpectedly long time to train because of some complex interaction, you probably don't want it to hold up your entire modeling run.
model_build_time_limit integer Time spent on model build.
enable_stop_after_time_limit boolean (Neural Network, K-Means, Kohonen, TwoStep, SVM, KNN, Bayes Net and C&R Tree models only.)
Stops a run after a specified number of hours. All models generated up to that point will be included in the model nugget, but no further models will be produced.
stop_after_time_limit double Run time limit (hours).
stop_if_valid_model boolean Stops a run when a model passes all criteria specified under the Discard settings.