Data mining — Decision tree classification

Intelligent Miner® supports a decision tree implementation of classification. A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules. This algorithm scales well, even where there are varying numbers of training examples and considerable numbers of attributes in large databases.

Decision Tree Classification generates the output as a binary tree-like structure, which gives fairly easy interpretation to the marketing people and easy identification of significant variables for the churn management. A Decision Tree model contains rules to predict the target variable. The Tree Classification algorithm provides an easy-to-understand description of the underlying distribution of the data.

The intuition is that, by classifying larger datasets, you will be able to improve the accuracy of the classification model. In classification, the given situation is a set of example records, called a training set, where each record consists of several fields or attributes. Attributes are either numerical (coming from an ordered domain), or categorical (coming from an unordered domain). One of the attributes, called the class label field (target field), indicates the class to which each example belongs. The objective of classification is to build a model of the class label based on the other attributes. After a model is built, it can be used to determine the class label of unclassified records. Applications of classification arise in diverse fields, such as retail target marketing, customer retention, fraud detection, and medical diagnosis.

Among these models, decision trees are particularly suited for data mining. Decision trees can be constructed relatively quickly, compared to other methods. Another advantage is that decision tree models are simple and easy to understand. A decision tree is a class discriminator that recursively partitions the training set until each partition consists entirely or dominantly of examples from one class. Each non-leaf node of the tree contains a split point that is a test on one or more attributes and determines how the data is partitioned. The tree is built by recursively partitioning the data. Partitioning continues until each partition is either 'pure' (all members belong to the same class) or sufficiently small (a parameter set by the user). The initial lists created from the training set are associated with the root of the decision tree. As the tree is grown and nodes are split to create new children, the attribute lists for each node are partitioned and associated with the children.

A decision tree classifier is built in two phases:

A growth phase
A prune phase

After the initial tree has been built (the 'growth phase'), a sub-tree is built with the least estimated error rate (the 'prune phase'). The process of pruning the initial tree consists of removing small, deep nodes of the tree resulting from 'noise' contained in the training data, thus reducing the risk of 'overfitting', and resulting in a more accurate classification of unknown data. While the decision tree is being built, the goal at each node is to determine the split attribute and the split point that best divides the training records belonging to that leaf. The value of a split point depends on how well it separates the classes. Several splitting indices have been proposed in the past to evaluate the quality of the split. Intelligent Miner uses the gini index.