Scoring expressions

Scoring expressions apply model XML from an external file to the active dataset and generate predicted values, predicted probabilities, and other values based on that model.

Scoring expressions must be preceded by a MODEL HANDLE command that identifies the external XML model file and optionally does variable mapping.
Scoring expressions require two arguments: the first identifies the model, and the second identifies the scoring function. An optional third argument allows users to obtain the probability (for each case) associated with a selected category, in the case of a categorical target variable. It is also used in nearest neighbor models to specify a particular neighbor.
Prior to applying scoring functions to a set of data, a data validation analysis is performed. The analysis includes checking that data are of the correct type as well as checking that the data values are in the set of allowed values defined in the model. For example, for categorical variables, a value that is neither a valid category nor defined as user-missing would be treated as an invalid value. Values that are found to be invalid are treated as system-missing.

The following scoring expressions are available:

ApplyModel. ApplyModel(handle, "function", value). Numeric. Applies a particular scoring function to the input case data using the model specified by handle and where "function" is one of the following string literal values enclosed in quotes: predict, stddev, probability, confidence, nodeid, cumhazard, neighbor, distance. The model handle is the name associated with the external XML file, as defined on the MODEL HANDLE command. The optional third argument applies when the function is "probability", "neighbor", or "distance". For "probability", it specifies a category for which the probability is calculated. For "neighbor" and "distance", it specifies a particular neighbor (as an integer) for nearest neighbor models. ApplyModel returns system-missing if a value can not be computed.

StrApplyModel. StrApplyModel(handle, "function", value). String. Applies a particular scoring function to the input case data using the model specified by handle and where "function" is one of the following string literal values enclosed in quotes: predict, stddev, probability, confidence, nodeid, cumhazard, neighbor, distance. The model handle is the name associated with the external XML file, as defined on the MODEL HANDLE command. The optional third argument applies when the function is "probability", "neighbor", or "distance". For "probability", it specifies a category for which the probability is calculated. For "neighbor" and "distance", it specifies a particular neighbor (as an integer) for nearest neighbor models. StrApplyModel returns a blank string if a value cannot be computed.

String values must be enclosed in quotation marks. For example, ApplyModel(name1, ‘probability’, ‘reject’), where name1 is the model’s handle name and ‘reject’ is a valid category for a target variable that is a string.
Negative values must be enclosed in quotation marks. For example, ApplyModel(name1, ‘probability’, ‘-1’).

The following scoring functions are available:

Scoring function	Description
PREDICT	Returns the predicted value of the target variable.
STDDEV	Standard deviation.
PROBABILITY	Probability associated with a particular category of a target variable. Applies only to categorical variables. In the absence of the optional third parameter, category, this is the probability that the predicted category is the correct one for the target variable. If a particular category is specified, then this is the probability that the specified category is the correct one for the target variable.
CONFIDENCE	A probability measure associated with the predicted value of a categorical target variable. Applies only to categorical variables.
NODEID	The terminal node number. Applies only to tree models.
CUMHAZARD	Cumulative hazard value. Applies only to Cox regression models.
NEIGHBOR	The ID of the kth nearest neighbor. Applies only to nearest neighbor models. In the absence of the optional third parameter, k, this is the ID of the nearest neighbor. The ID is the value of the case labels variable, if supplied, and otherwise the case number.
DISTANCE	The distance to the kth nearest neighbor. Applies only to nearest neighbor models. In the absence of the optional third parameter, k, this is the distance to the nearest neighbor. Depending on the model, either Euclidean or City Block distance will be used.

The following table lists the set of scoring functions available for each type of model that supports scoring. The function type denoted as PROBABILITY (category) refers to specification of a particular category (the optional third parameter) for the PROBABILITY function.

Table 1. Supported functions by model type
Model type	Supported functions
Tree (categorical target)	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`, `NODEID`
Tree (scale target)	`PREDICT`, `NODEID`, `STDDEV`
Boosted Tree (C5.0)	`PREDICT`, `CONFIDENCE`
Linear Regression	`PREDICT`, `STDDEV`
Automatic Linear Models	`PREDICT`
Binary Logistic Regression	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Conditional Logistic Regression	`PREDICT`
Multinomial Logistic Regression	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
General Linear Model	`PREDICT`, `STDDEV`
Discriminant	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`
TwoStep Cluster	`PREDICT`
K-Means Cluster	`PREDICT`
Kohonen	`PREDICT`
Neural Net (categorical target)	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Neural Net (scale target)	`PREDICT`
Naive Bayes	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Anomaly Detection	`PREDICT`
Ruleset	`PREDICT`, `CONFIDENCE`
Generalized Linear Model (categorical target)	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Generalized Linear Model (scale target)	`PREDICT`, `STDDEV`
Generalized Linear Mixed Model (categorical target)	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Generalized Linear Mixed Model (scale target)	`PREDICT`
Ordinal Multinomial Regression	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`
Cox Regression	`PREDICT`, `CUMHAZARD`
Nearest Neighbor (scale target)	`PREDICT`, `NEIGHBOR, NEIGHBOR(K)`, `DISTANCE, DISTANCE(K)`
Nearest Neighbor (categorical target)	`PREDICT`, `PROBABILITY`, `PROBABILITY (category)`, `CONFIDENCE`,`NEIGHBOR`, `NEIGHBOR(K)`,`DISTANCE`, `DISTANCE(K)`

For the Binary Logistic Regression, Multinomial Logistic Regression, and Naive Bayes models, the value returned by the CONFIDENCE function is identical to that returned by the PROBABILITY function.
For the K-Means model, the value returned by the CONFIDENCE function is the least distance.
For tree and ruleset models, the confidence can be interpreted as an adjusted probability of the predicted category and is always less than the value given by PROBABILITY. For these models, the confidence value is more reliable than the value given by PROBABILITY.
For neural network models, the confidence provides a measure of whether the predicted category is much more likely than the second-best predicted category.
For Ordinal Multinomial Regression and Generalized Linear Model, the PROBABILITY function is supported when the target variable is binary.
For nearest neighbor models without a target variable, the available functions are NEIGHBOR and DISTANCE.