Optional model settings

When building models, you can choose from the following optional settings. Note that depending on the type of model and the target selected, you may not see all of these settings. If changing any of these settings for an existing model, the model must be rebuilt for changes to apply.

Automatically clean up and prepare data for reliable model building. Identifies and repairs data issues to make modeling faster, more predictable, and more reliable. Screens fields that are problematic or not likely to be useful, for example by handling missing and extreme values, deriving new attributes when appropriate, and improving performance through intelligent screening and sampling techniques. The first time a model is built with a new data source, the analysis is done to identify issues and fixes, which may slow performance on the first pass only. On subsequent runs, the fixes are applied, but the analysis is not repeated unless the data source changes. This setting may be disabled for some models, including those with custom data preparation settings specified by an expert user.

Note: Automatic data preparation settings only apply when creating a new model, and binning is not performed.

Automatically partition data to enable model evaluation on build data source for evaluation and testing. Selecting this option splits the data into separate subsets or samples for training and testing the model. By building the model on one subset and testing it on another, you can get an idea how it will generalize to other data sets. You can also specify a percentage value for the randomly generated training partition size and testing partition size.

The minimum training partition size is 1 and the maximum 100. The minimum testing partition size is 0 and the maximum is 100. The minimum validation partition size is 0 and the maximum is 100. The total percentage must be 100 or less for the model to build successfully.

These partitioning options are available for all model types except association modeling.

Choose model techniques for model building. If desired, click Select to choose which modeling techniques to include when building the model. Based on the target you selected, all available modeling techniques will be listed. The modeling techniques available (and those selected by default) will vary depending on the target you selected. For example, the Decision List algorithm will only appear for flag targets. This feature is only available for predictive models.

For complete details about the modeling techniques, see the IBM® SPSS® Modeler Algorithms Guide and other documentation shipped with the IBM SPSS Modeler product and available on the Web.

Maximum # of models to be combined. Allows you to set the maximum number of models to be retained and combined. If you set this option to 1, a single model will be built and retained. If there are less models built than the value entered here, then all models built will be retained and combined. This option is only available for predictive models.

Profit Criteria (used to rank models with binary targets). Allows you to set profit criteria prior to building the model. The values set here will be set as the default for evaluating the model. This option is only available for predictive models with a flag target selected.

Specify inputs to use. Allows you to select the fields you want to use. Typically these would be those that have some practical relationship to the thing you are trying to predict, such as age or income. If you have a large data set, limiting the number of fields is one way of simplifying the model. If the data contains fields such as customer id or contact number, these would not typically be useful in modeling and should not be selected. Fields that duplicate other data may also be excluded.

Clicking on a linked input field (an expression) opens the expression viewer for that expression. To edit an expression, see the Data tab. See the topic Expression Builder for more information.

Specify selections to use. Specifies which records to include or exclude when modeling. You can search for existing rules, or create new ones as appropriate. See the topic Defining selection rules for more information.

In addition, if global selections have been defined, they will be displayed here, and you can specify whether they should also apply during modeling. You can either choose to use all or none of the global selection rules; you cannot choose a subset.