Scoring expressions
Scoring expressions apply model XML from an external file to the active dataset and generate predicted values, predicted probabilities, and other values based on that model.
- Scoring expressions must be preceded by a
MODEL HANDLE
command that identifies the external XML model file and optionally does variable mapping. - Scoring expressions require two arguments: the first identifies the model, and the second identifies the scoring function. An optional third argument allows users to obtain the probability (for each case) associated with a selected category, in the case of a categorical target variable. It is also used in nearest neighbor models to specify a particular neighbor.
- Prior to applying scoring functions to a set of data, a data validation analysis is performed. The analysis includes checking that data are of the correct type as well as checking that the data values are in the set of allowed values defined in the model. For example, for categorical variables, a value that is neither a valid category nor defined as user-missing would be treated as an invalid value. Values that are found to be invalid are treated as system-missing.
The following scoring expressions are available:
ApplyModel. ApplyModel(handle, "function", value). Numeric. Applies a particular scoring function to the input case data using the model specified by handle and where "function" is one of the following string literal values enclosed in quotes: predict, stddev, probability, confidence, nodeid, cumhazard, neighbor, distance. The model handle is the name associated with the external XML file, as defined on the MODEL HANDLE command. The optional third argument applies when the function is "probability", "neighbor", or "distance". For "probability", it specifies a category for which the probability is calculated. For "neighbor" and "distance", it specifies a particular neighbor (as an integer) for nearest neighbor models. ApplyModel returns system-missing if a value can not be computed.
StrApplyModel. StrApplyModel(handle, "function", value). String. Applies a particular scoring function to the input case data using the model specified by handle and where "function" is one of the following string literal values enclosed in quotes: predict, stddev, probability, confidence, nodeid, cumhazard, neighbor, distance. The model handle is the name associated with the external XML file, as defined on the MODEL HANDLE command. The optional third argument applies when the function is "probability", "neighbor", or "distance". For "probability", it specifies a category for which the probability is calculated. For "neighbor" and "distance", it specifies a particular neighbor (as an integer) for nearest neighbor models. StrApplyModel returns a blank string if a value cannot be computed.
- String values must be enclosed in quotation marks. For example,
ApplyModel(name1, ‘probability’, ‘reject’)
, wherename1
is the model’s handle name and‘reject’
is a valid category for a target variable that is a string. - Negative values must be enclosed in quotation marks. For example,
ApplyModel(name1, ‘probability’, ‘-1’)
.
The following scoring functions are available:
Scoring function | Description |
---|---|
PREDICT | Returns the predicted value of the target variable. |
STDDEV | Standard deviation. |
PROBABILITY | Probability associated with a particular category of a target variable. Applies only to categorical variables. In the absence of the optional third parameter, category, this is the probability that the predicted category is the correct one for the target variable. If a particular category is specified, then this is the probability that the specified category is the correct one for the target variable. |
CONFIDENCE | A probability measure associated with the predicted value of a categorical target variable. Applies only to categorical variables. |
NODEID | The terminal node number. Applies only to tree models. |
CUMHAZARD | Cumulative hazard value. Applies only to Cox regression models. |
NEIGHBOR | The ID of the kth nearest neighbor. Applies only to nearest neighbor models. In the absence of the optional third parameter, k, this is the ID of the nearest neighbor. The ID is the value of the case labels variable, if supplied, and otherwise the case number. |
DISTANCE | The distance to the kth nearest neighbor. Applies only to nearest neighbor models. In the absence of the optional third parameter, k, this is the distance to the nearest neighbor. Depending on the model, either Euclidean or City Block distance will be used. |
The following table lists
the set of scoring functions available for each type of model that
supports scoring. The function type denoted as PROBABILITY
(category)
refers to specification of a particular category (the
optional third parameter) for the PROBABILITY
function.
Model type | Supported functions |
---|---|
Tree (categorical target) |
|
Tree (scale target) |
|
Boosted Tree (C5.0) |
|
Linear Regression |
|
Automatic Linear Models |
|
Binary Logistic Regression |
|
Conditional Logistic Regression |
|
Multinomial Logistic Regression |
|
General Linear Model |
|
Discriminant |
|
TwoStep Cluster |
|
K-Means Cluster |
|
Kohonen |
|
Neural Net (categorical target) |
|
Neural Net (scale target) |
|
Naive Bayes |
|
Anomaly Detection |
|
Ruleset |
|
Generalized Linear Model (categorical target) |
|
Generalized Linear Model (scale target) |
|
Generalized Linear Mixed Model (categorical target) |
|
Generalized Linear Mixed Model (scale target) |
|
Ordinal Multinomial Regression |
|
Cox Regression |
|
Nearest Neighbor (scale target) |
|
Nearest Neighbor (categorical target) |
|
- For the Binary Logistic Regression, Multinomial Logistic Regression,
and Naive Bayes models, the value returned by the
CONFIDENCE
function is identical to that returned by thePROBABILITY
function. - For the K-Means model, the value returned by the
CONFIDENCE
function is the least distance. - For tree and ruleset models, the confidence can be interpreted
as an adjusted probability of the predicted category and is always
less than the value given by
PROBABILITY
. For these models, the confidence value is more reliable than the value given byPROBABILITY
. - For neural network models, the confidence provides a measure of whether the predicted category is much more likely than the second-best predicted category.
- For Ordinal Multinomial Regression and Generalized Linear Model,
the
PROBABILITY
function is supported when the target variable is binary. - For nearest neighbor models without a target variable, the available
functions are
NEIGHBOR
andDISTANCE
.