IBM® Db2® for z/OS® Models - Decision Tree Node - Tree Pruning
You can use the pruning options to specify pruning criteria for the decision tree. The intention of pruning is to reduce the risk of overfitting by removing overgrown subgroups that do not improve the expected accuracy on new data.
Pruning measure. The default pruning measure, Accuracy, ensures that the estimated accuracy of the model remains within acceptable limits after removing a leaf from the tree. Use the alternative, Weighted Accuracy, if you want to take the class weights into account while applying pruning.
Data for pruning. You can use some or all of the training data to estimate the expected accuracy on new data. Alternatively, you can use a separate pruning dataset from a specified table for this purpose.
- Use all training data. This option (the default) uses all the training data to estimate the model accuracy.
- Use % of training data for pruning. Use this option to split the data into two sets, one for training and one for pruning, using the percentage specified here for the pruning data.
- Select Replicate results if you want to specify a random seed to ensure that the data is partitioned in the same way each time you run the stream. You can either specify an integer in the Seed used for pruning field, or click Generate, which will create a pseudo-random integer.
- Use data from an existing table. Specify the table name of a separate pruning data set for estimating model accuracy. Doing so is considered more reliable than using training data.