|
The Classification and Regression (C&R) Tree node generates a decision tree that allows
you to predict or classify future observations. The method uses recursive partitioning to split the
training records into segments by minimizing the impurity at each step, where a node in the tree is
considered “pure” if 100% of cases in the node fall into a specific category of the target field.
Target and input fields can be numeric ranges or categorical (nominal, ordinal, or flags); all
splits are binary (only two subgroups).
|
Example
node = stream.createAt("cart", "My node", 200, 100)
# "Fields" tab
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "BP", "Cholesterol"])
# "Build Options" tab, "Objective" panel
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", """Grow Node Index 0 Children 1 2
Grow Node Index 2 Children 3 4""")
# "Build Options" tab, "Basics" panel
node.setPropertyValue("prune_tree", False)
node.setPropertyValue("use_std_err_rule", True)
node.setPropertyValue("std_err_multiplier", 3.0)
node.setPropertyValue("max_surrogates", 7)
# "Build Options" tab, "Stopping Rules" panel
node.setPropertyValue("use_percentage", True)
node.setPropertyValue("min_parent_records_pc", 5)
node.setPropertyValue("min_child_records_pc", 3)
# "Build Options" tab, "Advanced" panel
node.setPropertyValue("min_impurity", 0.0003)
node.setPropertyValue("impurity_measure", "Twoing")
# "Model Options" tab
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "Cart_Drug")
Table 1. cartnode properties
cartnode Properties |
Values |
Property description |
target
|
field
|
C&R Tree
models require a single target and one or more input fields. A frequency
field can also be specified. See the topic Common modeling node properties for more information. |
continue_training_existing_model
|
flag
|
|
objective
|
Standard
Boosting
Bagging
psm
|
psm is used for very large datasets, and requires
a Server connection. |
model_output_type
|
Single
InteractiveBuilder
|
|
use_tree_directives
|
flag
|
|
tree_directives
|
string
|
Specify
directives for growing the tree. Directives can be wrapped in triple
quotes to avoid escaping newlines or quotes. Note that directives
may be highly sensitive to minor changes in data or modeling options
and may not generalize to other datasets.
See the example
for usage.
|
use_max_depth
|
Default
Custom
|
|
max_depth
|
integer
|
Maximum
tree depth, from 0 to 1000. Used only if use_max_depth
= Custom . |
prune_tree
|
flag
|
Prune tree
to avoid overfitting. |
use_std_err
|
flag
|
Use maximum
difference in risk (in Standard Errors). |
std_err_multiplier
|
number
|
Maximum
difference. |
max_surrogates
|
number
|
Maximum
surrogates. |
use_percentage
|
flag
|
|
min_parent_records_pc
|
number
|
|
min_child_records_pc
|
number
|
|
min_parent_records_abs
|
number
|
|
min_child_records_abs
|
number
|
|
use_costs
|
flag
|
|
costs
|
structured
|
Structured property. |
priors
|
Data
Equal
Custom
|
|
custom_priors
|
structured
|
Structured property. |
adjust_priors
|
flag
|
|
trails
|
number
|
Number of
component models for boosting or bagging. |
set_ensemble_method
|
Voting
HighestProbability
HighestMeanProbability
|
Default combining
rule for categorical targets. |
range_ensemble_method
|
Mean
Median
|
Default combining rule
for continuous targets. |
large_boost
|
flag
|
Apply
boosting to very large data sets. |
min_impurity
|
number
|
|
impurity_measure
|
Gini
Twoing
Ordered
|
|
train_pct
|
number
|
Overfit prevention
set. |
set_random_seed
|
flag
|
Replicate
results option. |
seed
|
number
|
|
calculate_variable_importance
|
flag
|
|
calculate_raw_propensities
|
flag
|
|
calculate_adjusted_propensities
|
flag
|
|
adjusted_propensity_partition
|
Test
Validation
|
|