Example for creating a decision tree

This example shows how to build a decision tree on the ADULT sample data set.

Note: This feature is available starting from Db2® version 11.5.4.

You can split the ADULT sample data set into a training data set and a validation data set as follows:

CALL IDAX.SPLIT_DATA('intable=ADULT, traintable=ADULTTRAIN, testtable=ADULTTEST, id=ID, fraction=0.35');

To build a decision tree, you can use constraints on different parameters of the tree. The best-known constraints are the depth of the tree, the minimum number of records per tree node, and the minimum impurity improvement per tree node.

The following call runs the algorithm on the ADULTTRAIN data set and builds the decision tree.

CALL IDAX.GROW_DECTREE('model=adult_dect, intable=ADULTTRAIN, id=ID, target=MARITAL_STATUS, maxdepth=8, minsplit=10, minimprove=0.01');

The decision tree has the following attributes:

  1. At most eight levels in the tree
  2. At least ten records per non-leaf mode
  3. Non-leaf nodes with an impurity that is at least 1% higher than the impurity of their child nodes

To prune the overfitting nodes of the model, you can apply the model to the validation data set.

The following call shows how to prune overfitting nodes of the decision tree.

CALL IDAX.PRUNE_DECTREE('model=adult_dect, valtable=ADULTTEST');

By pruning the overfitting nodes, the number of nodes in the decision tree decreases from 69 to 25 for the adult_dect model.

To inspect the built model, use the PRINT_MODEL stored procedure as shown in the following example:

CALL IDAX.PRINT_MODEL('model=adult_dect');

The PREDICT_DECTREE stored procedure predicts the value for the MARITAL_STATUS column.

The following call shows how to associate the values to new transactions.

CALL IDAX.PREDICT_DECTREE('model=adult_dect, intable=ADULTTEST, outtable=adult_dect_out');