Features Affected by Splitting

The use of split models affects a number of IBM® SPSS® Modeler features in various ways. This section provides guidance on using split models with other nodes in a stream.

Record Ops nodes

When you use split models in a stream that contains a Sample node, stratify records by the split field to achieve an even sampling of records. This option is available when you choose Complex as the sample method.

If the stream contains a Balance node, balancing applies to the overall set of input records, not to the subset of records inside a split.

When aggregating records by means of an Aggregate node, set the split fields to be key fields if you want to calculate aggregates for each split.

Field Ops nodes

The Type node is where you specify which field or fields to use as split fields.

Note: While the Ensemble node is used to combine two or more model nuggets, it cannot be used to reverse the action of splitting, as the split models are contained inside a single model nugget.

Modeling nodes

Split models do not support the calculation of predictor importance (the relative importance of the predictor input fields in estimating the model). Predictor importance settings are ignored when building split models.

Note: Adjusted propensity score settings are ignored when using a split model.

The KNN (nearest neighbor) node supports split models only if it is set to predict a target field. The alternative setting (only identify nearest neighbors) does not create a model. If the option Automatically select k is chosen, each of the split models might have a different number of nearest neighbors. Thus the overall model has a number of generated columns equal to the largest number of nearest neighbors that are found across all the split models. For those split models where the number of nearest neighbors is less than this maximum, there is a corresponding number of columns filled with $null values. See the topic KNN node for more information.

Database Modeling nodes

The in-database modeling nodes do not support split models.

Model nuggets

Export to PMML from a split model nugget is not possible, as the nugget contains multiple models and PMML does not support such a packaging. Export to text or HTML is possible.