C&R Tree Node - Advanced

The advanced options enable you to fine-tune the tree-building process.

Minimum change in impurity. Specify the minimum change in impurity to create a new split in the tree. Impurity refers to the extent to which subgroups defined by the tree have a wide range of output field values within each group. For categorical targets, a node is considered “pure” if 100% of cases in the node fall into a specific category of the target field. The goal of tree building is to create subgroups with similar output values--in other words, to minimize the impurity within each node. If the best split for a branch reduces the impurity by less than the specified amount, the split will not be made.

Impurity measure for categorical targets. For categorical target fields, specify the method used to measure the impurity of the tree. (For continuous targets, this option is ignored, and the least squared deviation impurity measure is always used.)

  • Gini is a general impurity measure based on probabilities of category membership for the branch.
  • Twoing is an impurity measure that emphasizes the binary split and is more likely to lead to approximately equal-sized branches from a split.
  • Ordered adds the additional constraint that only contiguous target classes can be grouped together, as is applicable only with ordinal targets. If this option is selected for a nominal target, the standard twoing measure is used by default.

Overfit prevention set. The algorithm internally separates records into a model building set and an overfit prevention set, which is an independent set of data records used to track errors during training in order to prevent the method from modeling chance variation in the data. Specify a percentage of records. The default is 30.

Replicate results. Setting a random seed enables you to replicate analyses. Specify an integer or click Generate, which will create a pseudo-random integer between 1 and 2147483647, inclusive.