|
The CHAID node generates decision trees using chi-square statistics to identify optimal
splits. Unlike the C&R Tree and QUEST nodes, CHAID can generate nonbinary trees, meaning that
some splits have more than two branches. Target and input fields can be numeric range (continuous)
or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of
examining all possible splits but takes longer to compute.
|
Example
filenode = stream.createAt("variablefile", "My node", 100, 100)
filenode.setPropertyValue("full_filename", "$CLEO_DEMOS/DRUG1n")
node = stream.createAt("chaid", "My node", 200, 100)
stream.link(filenode, node)
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "Na", "K", "Cholesterol", "BP"])
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "CHAID")
node.setPropertyValue("method", "Chaid")
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", "Test")
node.setPropertyValue("split_alpha", 0.03)
node.setPropertyValue("merge_alpha", 0.04)
node.setPropertyValue("chi_square", "Pearson")
node.setPropertyValue("use_percentage", False)
node.setPropertyValue("min_parent_records_abs", 40)
node.setPropertyValue("min_child_records_abs", 30)
node.setPropertyValue("epsilon", 0.003)
node.setPropertyValue("max_iterations", 75)
node.setPropertyValue("split_merged_categories", True)
node.setPropertyValue("bonferroni_adjustment", True)
Table 1. chaidnode properties
chaidnode Properties |
Values |
Property description |
target
|
field
|
CHAID models require a single target and one or more input fields. A frequency field can also
be specified. See the topic Common modeling node properties for more information. |
continue_training_existing_model
|
flag
|
|
objective
|
Standard
Boosting
Bagging
psm
|
psm is used for very large datasets, and requires a Server connection. |
model_output_type
|
Single
InteractiveBuilder
|
|
use_tree_directives
|
flag
|
|
tree_directives
|
string
|
|
method
|
Chaid
ExhaustiveChaid
|
|
use_max_depth
|
Default
Custom
|
|
max_depth
|
integer
|
Maximum tree depth, from 0 to 1000. Used only if use_max_depth =
Custom . |
use_percentage
|
flag
|
|
min_parent_records_pc
|
number
|
|
min_child_records_pc
|
number
|
|
min_parent_records_abs
|
number
|
|
min_child_records_abs
|
number
|
|
use_costs
|
flag
|
|
costs
|
structured
|
Structured property. |
trails
|
number
|
Number of component models for boosting or bagging. |
set_ensemble_method
|
Voting
HighestProbability
HighestMeanProbability
|
Default combining rule for categorical targets. |
range_ensemble_method
|
Mean
Median
|
Default combining rule for continuous targets. |
large_boost
|
flag
|
Apply boosting to very large data sets. |
split_alpha
|
number
|
Significance level for splitting. |
merge_alpha
|
number
|
Significance level for merging. |
bonferroni_adjustment
|
flag
|
Adjust significance values using Bonferroni method. |
split_merged_categories
|
flag
|
Allow resplitting of merged categories. |
chi_square
|
Pearson
LR
|
Method used to calculate the chi-square statistic: Pearson or Likelihood Ratio |
epsilon
|
number
|
Minimum change in expected cell frequencies.. |
max_iterations
|
number
|
Maximum iterations for convergence. |
set_random_seed
|
integer
|
|
seed
|
number
|
|
calculate_variable_importance
|
flag
|
|
calculate_raw_propensities
|
flag
|
|
calculate_adjusted_propensities
|
flag
|
|
adjusted_propensity_partition
|
Test
Validation
|
|
maximum_number_of_models
|
integer
|
|