Decision Tree Nodes - Objectives
For the C&R Tree, QUEST, and CHAID nodes, in the Objectives pane on the Build Options tab, you can choose whether to build a new model or update an existing one. You also set the main objective of the node: to build a standard model, to build one with enhanced accuracy or stability, or to build one for use with very large datasets.
What do you want to do?
Build new model. (Default) Creates a completely new model each time you run a stream containing this modeling node.Continue training existing model. By default, a completely new model is created each time a modeling node is executed. If this option is selected, training continues with the last model successfully produced by the node. This makes it possible to update or refresh an existing model without having to access the original data and may result in significantly faster performance since only the new or updated records are fed into the stream. Details on the previous model are stored with the modeling node, making it possible to use this option even if the previous model nugget is no longer available in the stream or Models palette.
What is your main objective?
- Build a single tree. Creates a single, standard
decision tree model. Standard models are generally easier to interpret, and can be faster to score,
than models built using the other objective options. Note: Continue training existing model is only supported with Build a single tree split models, and you must be connected to Analytic Server.
Mode. Specifies the method used to build the model. Generate model creates a model automatically when the stream is run. Launch interactive session opens the tree builder, which enables you to build your tree one level at a time, edit splits, and prune as desired before creating the model nugget.
Use tree directives. Select this option to specify directives to apply when generating an interactive tree from the node. For example, you can specify the first- and second-level splits, and these would automatically be applied when the tree builder is launched. You can also save directives from an interactive tree-building session in order to re-create the tree at a future date. See the topic Updating Tree Directives for more information.
- Enhance model accuracy (boosting). Choose this option if you want to use a special method, known as boosting, to improve the model accuracy rate. Boosting works by building multiple models in a sequence. The first model is built in the usual way. Then, a second model is built in such a way that it focuses on the records that were misclassified by the first model. Then a third model is built to focus on the second model's errors, and so on. Finally, cases are classified by applying the whole set of models to them, using a weighted voting procedure to combine the separate predictions into one overall prediction. Boosting can significantly improve the accuracy of a decision tree model, but it also requires longer training.
- Enhance model stability (bagging). Choose this option if you want to use a special method, known as bagging (bootstrap aggregating), to improve the stability of the model and to avoid overfitting. This option creates multiple models and combines them, in order to obtain more reliable predictions. Models obtained using this option can take longer to build and score than standard models.
- Create a model for very large datasets. Choose this
option when working with datasets that are too large to build a model using any of the other
objective options. This option divides the data into smaller data blocks and builds a model on each
block. The most accurate models are then automatically selected and combined into a single model
nugget. You can perform incremental model updating if you select the Continue training
existing model option on this screen. The Continue training existing
model option is only supported with Create a model for very large
datasets models, and you don't need to connect to Analytic Server. But a
model for very large datasets can't be created with splits.Note: This option for very large datasets requires a connection to IBM® SPSS® Modeler Server.