Example for creating a regression tree
This example shows how to build a regression tree on the ADULT sample data set.
You can split the ADULT sample data set into a training data set and a validation data set as follows:
CREATE VIEW CUSTOMER_DATA AS SELECT DURATION, IN_B2B_INDUSTRY, TOTAL_BUY, ANNUAL_REVENUE_MIL, CUST_ID AS ID from SAMPLES.CUSTOMER_CHURN;
CALL IDAX.SPLIT_DATA('intable=customer_data, traintable=customer_train, testtable=customer_test, id=id, fraction=0.65');
To build a regression tree, you can use constraints on different parameters of the tree. The best-known constraints are the depth of the tree, the minimum number of records per tree node, and the minimum impurity improvement per tree node.
The following call runs the algorithm on the ADULTTRAIN data set and builds the regression tree.
CALL IDAX.GROW_REGTREE('model=customer_regt, intable=customer_train, id=ID, target=duration, maxdepth=8, minsplit=10, minimprove=0.01');
The regression tree has the following attributes:
- At most eight levels in the tree
- At least ten records per non-leaf node
- Non-leaf nodes with an impurity that is at least 1% higher than the impurity of their child nodes
To prune the overfitting nodes of the model, you can apply the model to the validation data set.
The following call shows how to prune overfitting nodes of the regression tree.
CALL IDAX.PRUNE_REGTREE('model=customer_regt, valtable=customer_test');
By pruning the overfitting nodes, the number of nodes in the regression tree decreases from 171 to 99 for the adult_regt model.
To inspect the built model, use the PRINT_MODEL stored procedure as shown in the following example:
CALL IDAX.PRINT_MODEL('model=customer_regt');
The PREDICT_REGTREE stored procedure predicts the value for the AGE column.
The following call shows how to associate the values to new transactions.
CALL IDAX.PREDICT_REGTREE('model=customer_regt, intable=customer_test, outtable=customer_regt_out');