Scoring data with predictive models
The process of applying a predictive model to a set of data is referred to as scoring the data. IBM® SPSS® Statistics has procedures for building predictive models such as regression, clustering, tree, and neural network models. Once a model has been built, the model specifications can be saved in a file that contains all of the information necessary to reconstruct the model. You can then use that model file to generate predictive scores in other datasets. Note: Some procedures produce a model XML file, and some procedures produce a compressed file archive (.zip file).
Example. The direct marketing division of a company uses results from a test mailing to assign propensity scores to the rest of their contact database, using various demographic characteristics to identify contacts most likely to respond and make a purchase.
Scoring is treated as a transformation of the data. The model is expressed internally as a set of numeric transformations to be applied to a given set of fields (variables)--the predictors specified in the model--in order to obtain a predicted result. In this sense, the process of scoring data with a given model is inherently the same as applying any function, such as a square root function, to a set of data.
The scoring process consists of two basic steps:
- Build the model and save the model file. You build the model using
a dataset for which the outcome of interest (often referred to as
the target) is known. For example, if you want to build a model
that will predict who is likely to respond to a direct mail campaign,
you need to start with a dataset that already contains information
on who responded and who did not respond. For example, this might
be the results of a test mailing to a small group of customers or
information on responses to a similar campaign in the past.
Note: For some model types there is no target outcome of interest. Clustering models, for example, do not have a target, and some nearest neighbor models do not have a target.
- Apply that model to a different dataset (for which the outcome of interest is not known) to obtain predicted outcomes.