Tree View

The Tree View displays results of tree classification mining runs. It shows the decision tree that is constructed by the Classification mining function. The decision tree is embedded in a table.

The Tree View also shows the fields that are derived from input fields when you move the mouse over the branches of the tree.

The tree consists of different nodes. Each node of the tree corresponds to a row in the table. The columns in the table display the attributes of the nodes.

The following figure shows an example of a Tree View:
Figure 1. The Tree View of the Classification Visualizer
This graphic shows the Tree View of the Classification Visualizer.
The table in the Tree View contains the following columns:
Tree
Each node in the Tree column includes a diagram visualizing the distribution of records in this node over the predictable classes. Each of the classes that can be predicted is represented by a different color. The share of a color in a node represents the percentage of records that belong to the appropriate class. Two or more classes can be predicted. You can change the color of a particular class on the Color Coding page of the Properties notebook.

Next to the diagram, the appropriate decision that leads to the assignment of the records into this node is displayed.

The appropriate decision might include typical field predicates, for example, REVENUE_OF_CONTRACT < 912, or DISTRIBUTORCAR_DEALERS. Exceptional field predicates are, for example, LOCATION_SIZE IS NOT MISSING or LOCATION_SIZE IS MISSING.
  • The field predicate LOCATION_SIZE IS MISSING applies if the mining function discovers during the scoring of the record that this field is empty. This means that the field contains a NULL value, or it contains one of the values that are defined as missing in the data dictionary of the PMML model.
  • The field predicate LOCATION_SIZE IS NOT MISSING applies if the mining function discovers during the scoring of the record that this field is not empty. This means that it does not contain a NULL value or a value that is defined as missing in the data dictionary of the PMML model. The particular value is not important for scoring.
    For example, there might be a column that contains telephone numbers. The model might depend on the following conditions:
    • The telephone numbers are provided, this means the fields are not empty.
    • The telephone numbers are not provided, this means that the fields are empty or they contain NULL values.
    In this case, the telephone numbers are not relevant for the scoring of the records.

If there are many different classes of nearly the same size, for example, in case of numerical predictions, the diagrams in the nodes do not visualize the distribution of records, but the class label of this node. This means that the nodes are represented in the color that is associated with the predicted value as shown in the legend.

Node ID
The node IDs show the computed levels of the tree.

Each node in the tree is identified by a node ID. The node ID is constructed by extending a node's parent ID with ".x", where x is the position of this node with regard to the other sibling nodes. The root node has the ID 1.

For example, node B might be the second child of node A. Node A has the node ID 1.2. The node ID of node B is 1.2.2. With this kind of node ID construction we can easily figure out the path from the root node to any other node, and the level of a node in the tree.

Score
The score of a node is an attribute of the PMML model. It shows the class to which all records in this node are predicted. For example, you might have the scores Healthy and Ill, or Secure and Risky.
Record Count
The record count is an attribute of the PMML model. It shows the amount of records and the appropriate percentage in comparison to the total population. Additionally, the percentage of the record count is visualized by a histogram.
Purity
The purity is computed by the Classification mining function. It indicates the percentage of correctly predicted records in this node.
Pruned
Intelligent Miner® can generate results that include pruned nodes. The check box in the Pruned column is selected if a node is pruned.

The legend shows the color that you assigned to each class label and the appropriate values, for example, Yes and No, or Healthy or Ill. If you use continuous numerical values, the legend is represented as color scale.

Trees do not need to be binary. To be conform with the PMML 1.1 definition trees, they can have any number of branches in each node.