Decision Tree/Rule Set model nugget settings
The Settings tab for a decision tree or Rule Set model nugget enables you to specify options for confidences and for SQL generation during model scoring. This tab is available only after the model nugget has been added to a stream.
Calculate confidences Select to include confidences in
scoring operations. When scoring models in the database, excluding confidences enables you to
generate more efficient SQL. For regression trees, confidences are not assigned.
Note: If you select
the Create a model for very large datasets option on the Build Options tab -
Method panel for CHAID models, this checkbox is available only in the model nuggets for categorical
targets of nominal or flag.
Calculate raw propensity scores For models with a flag
target (which return a yes or no prediction), you can request propensity scores that indicate the
likelihood of the true outcome specified for the target field. These are in addition to other
prediction and confidence values that may be generated during scoring.
Note: If you select the
Create a model for very large datasets option on the Build Options tab -
Method panel for CHAID models, this checkbox is available only in model nuggets with a categorical
target of flag.
Calculate adjusted propensity scores Raw propensity
scores are based only on the training data and may be overly optimistic due to the tendency of many
models to overfit this data. Adjusted propensities attempt to compensate by evaluating model
performance against a test or validation partition. This option requires that a partition field be
defined in the stream and adjusted propensity scores be enabled in the modeling node before
generating the model.
Note: Adjusted propensity scores are not available for boosted tree and rule
set models. See the topic Boosted C5.0 Models for more information.
Rule identifier For CHAID, QUEST, and C&R Tree
models, this option adds a field in the scoring output that indicates the ID for the terminal node
to which each record is assigned.
Note: When this option is selected, SQL generation is not
available.
Generate SQL for this model When using data from a database, SQL code can be pushed back to the database for execution, providing superior performance for many operations.
Select one of the following options to specify how SQL generation is performed.
- Default: Score using Server Scoring Adapter (if installed) otherwise in process If connected to a database with a scoring adapter installed, generates SQL using the scoring adapter and associated user defined functions (UDF) and scores your model within the database. When no scoring adapter is available, this option fetches your data back from the database and scores it in SPSS® Modeler.
- Score by converting to native SQL without missing value
support If selected, generates native SQL to score the model within the database,
without the overhead of handling missing values. This option simply sets the prediction to null
(
$null$
) when a missing value is encountered while scoring a case.Note: This option is not available for CHAID models. For other model types, it is only available for decision trees (not rule sets). - Score by converting to native SQL with missing value
support For CHAID, QUEST, and C&R Tree models, you can generate native SQL to score
the model within the database with full missing value support. This means that SQL is generated so
that missing values are handled as specified in the model. For example, C&R Trees use surrogate
rules and biggest child fallback. Note: For C5.0 models, this option is only available for rule sets (not decision trees).
- Score outside of the Database If selected, this option fetches your data back from the database and scores it in SPSS Modeler.