Classification tree

A classification tree is a type of decision tree. It uses Gini impurity measure to classify records in the target field’s categories. The predictions are based on combinations of values in the input fields.

A classification tree calculates the predicted target category for each node in the tree. This type of tree is generated when the target field is categorical.

The algorithmic details are too complicated to describe here. You can see the frequency statistics in the tooltips for the nodes in the decision tree visualization. Each node is split into two or more child nodes to reduce the Gini impurity value for the node. Gini impurity is a function that penalizes more even distribution of target values and is based on the target frequency statistics and the number of data rows corresponding to the node. Child nodes corresponding to given predictor categories are merged when the corresponding increase in Gini impurity is tolerable within specified limit. For each node, the predictor that reduces the Gini impurity value the most is selected for splitting the node.

The process of building a decision tree starts with the root node that corresponds to all the rows in data. Any node is split into child nodes until no further improvement in Gini impurity is possible, or the number of data rows corresponding to the node becomes too small. Process also stops if the number of nodes in the decision tree becomes too large.

Predictive strength that is reported for classification tree is the adjusted count R2. It is obtained by computing the tree classification accuracy improvement over the constant model and dividing it by the constant model classification error. Constant model always predicts the target mode and its classification accuracy is estimated by the mode frequency. Reliable predictive classification tree is reported when its predictive strength is greater than a default threshold of 10%.