twostepnode properties

Twostep node iconThe TwoStep node uses a two-step clustering method. The first step makes a single pass through the data to compress the raw input data into a manageable set of subclusters. The second step uses a hierarchical clustering method to progressively merge the subclusters into larger and larger clusters. TwoStep has the advantage of automatically estimating the optimal number of clusters for the training data. It can handle mixed field types and large data sets efficiently.

Example

node = stream.create("twostep", "My node")
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("inputs", ["Age", "K", "Na", "BP"])
node.setPropertyValue("partition", "Test")
node.setPropertyValue("use_model_name", False)
node.setPropertyValue("model_name", "TwoStep_Drug")
node.setPropertyValue("use_partitioned_data", True)
node.setPropertyValue("exclude_outliers", True)
node.setPropertyValue("cluster_label", "String")
node.setPropertyValue("label_prefix", "TwoStep_")
node.setPropertyValue("cluster_num_auto", False)
node.setPropertyValue("max_num_clusters", 9)
node.setPropertyValue("min_num_clusters", 3)
node.setPropertyValue("num_clusters", 7)
Table 1. twostepnode properties
twostepnode Properties Values Property description
inputs [field1 ... fieldN] TwoStep models use a list of input fields, but no target. Weight and frequency fields are not recognized. See Common modeling node properties for more information.
standardize flag  
exclude_outliers flag  
percentage number  
cluster_num_auto flag  
min_num_clusters number  
max_num_clusters number  
num_clusters number  
cluster_label String Number  
label_prefix string  
distance_measure Euclidean Loglikelihood  
clustering_criterion AIC BIC