Overview (MODEL HANDLE command)

MODEL HANDLE reads an external XML file or ZIP archive containing specifications for a predictive model. It caches the model specifications and associates a unique name (handle) with the cached model. The model can then be used by the APPLYMODEL and STRAPPLYMODEL transformation functions to calculate scores and other results (see Scoring Expressions). The MODEL CLOSE command is used to discard a cached model from memory.

Different models can be applied to the same data by using separate MODEL HANDLE commands for each of the models.

Options

Variable Mapping. You can map any or all of the variables in the original model to different variables in the current active dataset. By default, the model is applied to variables in the current active dataset with the same names as the variables in the original model.

Handling Missing Values. You can choose how to handle cases with missing values. By default, an attempt is made to substitute a sensible value for a missing value, but you can choose to treat missing values as system-missing.

Basic specification

The basic specification is NAME and FILE. NAME specifies the model handle name to be used when referring to this model. FILE specifies the external file containing the model specifications.

Subcommand order

  • Subcommands can be specified in any order.

Syntax rules

  • When using the MAP subcommand, you must specify both the VARIABLES and MODELVARIABLES keywords.
  • Multiple MAP subcommands are allowed. Each MAP subcommand should provide the mappings for a distinct subset of the variables. Subsequent mappings of a given variable override any previous mappings of that same variable.

Operations

  • A model handle is used only during the current working session. The handle is not saved with the data file.
  • Issuing a SET LOCALE command that changes the computer’s code page requires closing any existing model handles (using MODEL CLOSE) and reopening the models (using MODEL HANDLE) before proceeding with scoring.

Models Supported for Scoring

IBM® SPSS® Statistics can score models created by IBM SPSS Statistics, IBM SPSS Modeler, and IBM SPSS AnswerTree.

The following table lists the set of scoring functions available for each type of model that supports scoring. The function type denoted as PROBABILITY (category) refers to specification of a particular category (the optional third parameter) for the PROBABILITY function.

Table 1. Supported functions by model type
Model type Supported functions

Tree (categorical target)

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE, NODEID

Tree (scale target)

PREDICT, NODEID, STDDEV

Boosted Tree (C5.0)

PREDICT, CONFIDENCE

Linear Regression

PREDICT, STDDEV

Automatic Linear Models

PREDICT

Binary Logistic Regression

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Conditional Logistic Regression

PREDICT

Multinomial Logistic Regression

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

General Linear Model

PREDICT, STDDEV

Discriminant

PREDICT, PROBABILITY, PROBABILITY (category)

TwoStep Cluster

PREDICT

K-Means Cluster

PREDICT

Kohonen

PREDICT

Neural Net (categorical target)

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Neural Net (scale target)

PREDICT

Naive Bayes

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Anomaly Detection

PREDICT

Ruleset

PREDICT, CONFIDENCE

Generalized Linear Model (categorical target)

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Generalized Linear Model (scale target)

PREDICT, STDDEV

Generalized Linear Mixed Model (categorical target)

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Generalized Linear Mixed Model (scale target)

PREDICT

Ordinal Multinomial Regression

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE

Cox Regression

PREDICT, CUMHAZARD

Nearest Neighbor (scale target)

PREDICT, NEIGHBOR, NEIGHBOR(K), DISTANCE, DISTANCE(K)

Nearest Neighbor (categorical target)

PREDICT, PROBABILITY, PROBABILITY (category), CONFIDENCE,NEIGHBOR, NEIGHBOR(K),DISTANCE, DISTANCE(K)

  • For the Binary Logistic Regression, Multinomial Logistic Regression, and Naive Bayes models, the value returned by the CONFIDENCE function is identical to that returned by the PROBABILITY function.
  • For the K-Means model, the value returned by the CONFIDENCE function is the least distance.
  • For tree and ruleset models, the confidence can be interpreted as an adjusted probability of the predicted category and is always less than the value given by PROBABILITY. For these models, the confidence value is more reliable than the value given by PROBABILITY.
  • For neural network models, the confidence provides a measure of whether the predicted category is much more likely than the second-best predicted category.
  • For Ordinal Multinomial Regression and Generalized Linear Model, the PROBABILITY function is supported when the target variable is binary.
  • For nearest neighbor models without a target variable, the available functions are NEIGHBOR and DISTANCE.

For information on applying scoring functions from a model, see Scoring expressions.