|
The TwoStep node uses a two-step clustering method. The first step makes a single pass
through the data to compress the raw input data into a manageable set of subclusters. The second
step uses a hierarchical clustering method to progressively merge the subclusters into larger and
larger clusters. TwoStep has the advantage of automatically estimating the optimal number of
clusters for the training data. It can handle mixed field types and large data sets efficiently.
|
Example
node = stream.create("twostep", "My node")
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("inputs", ["Age", "K", "Na", "BP"])
node.setPropertyValue("partition", "Test")
node.setPropertyValue("use_model_name", False)
node.setPropertyValue("model_name", "TwoStep_Drug")
node.setPropertyValue("use_partitioned_data", True)
node.setPropertyValue("exclude_outliers", True)
node.setPropertyValue("cluster_label", "String")
node.setPropertyValue("label_prefix", "TwoStep_")
node.setPropertyValue("cluster_num_auto", False)
node.setPropertyValue("max_num_clusters", 9)
node.setPropertyValue("min_num_clusters", 3)
node.setPropertyValue("num_clusters", 7)
Table 1. twostepnode properties
twostepnode Properties |
Values |
Property description |
inputs
|
[field1 ... fieldN] |
TwoStep models use a list of input fields, but no target. Weight
and frequency fields are not recognized. See the topic Common modeling node properties for more information. |
standardize
|
flag
|
|
exclude_outliers
|
flag
|
|
percentage
|
number
|
|
cluster_num_auto
|
flag
|
|
min_num_clusters
|
number
|
|
max_num_clusters
|
number
|
|
num_clusters
|
number
|
|
cluster_label
|
String
Number
|
|
label_prefix
|
string
|
|
distance_measure
|
Euclidean
Loglikelihood
|
|
clustering_criterion
|
AIC
BIC
|
|