Features Affected by Splitting
The use of split models affects a number of IBM® SPSS® Modeler features in various ways. This section provides guidance on using split models with other nodes in a stream.
Record Ops nodes
When you use split models in a stream that contains a Sample node, stratify records by the split field to achieve an even sampling of records. This option is available when you choose Complex as the sample method.
If the stream contains a Balance node, balancing applies to the overall set of input records, not to the subset of records inside a split.
When aggregating records by means of an Aggregate node, set the split fields to be key fields if you want to calculate aggregates for each split.
Field Ops nodes
The Type node is where you specify which field or fields to use as split fields.
Modeling nodes
Split models do not support the calculation of predictor importance (the relative importance of the predictor input fields in estimating the model). Predictor importance settings are ignored when building split models.
The KNN (nearest neighbor) node supports split models only if it is set to
predict a target field. The alternative setting (only identify nearest neighbors) does not create a
model. If the option Automatically select k is chosen, each of the split
models might have a different number of nearest neighbors. Thus the overall model has a number of
generated columns equal to the largest number of nearest neighbors that are found across all the
split models. For those split models where the number of nearest neighbors is less than this
maximum, there is a corresponding number of columns filled with $null
values. See
the topic KNN node for more information.
Database Modeling nodes
The in-database modeling nodes do not support split models.
Model nuggets
Export to PMML from a split model nugget is not possible, as the nugget contains multiple models and PMML does not support such a packaging. Export to text or HTML is possible.