Scorer operator

You can use the scorer operator to compute scores for new records based on an existing data mining model.

Scoring is the process of applying a mining model to new data records to compute the appropriate results. The scorer operator can use all types of models (Associations and Sequence Rules, Clustering, Classification, and Regression) that are stored in the Db2® model tables to score input data. Mining models are usually created by the model builder operators (refer to Model builders), or imported into the mining database using the model management functions (refer to Importing mining models).

For an example of how to use the scorer operator, refer to the tutorial Assigning a customer group ID to each customer (scoring).

Depending on the mining model to be applied, the scorer operator uses different mining functions and computes corresponding results for the new data records. The following figure shows an example where the scorer operator uses a clustering model to assign a CLUSTER_ID to each record in the input table.

Figure 1. Example scoring function flow
Example scoring function flow

Scoring rule models

The Associations mining function finds items that are consistently associated with each other in a meaningful way. For example, you can analyze purchase transactions to discover combinations of goods that are often purchased together. The Associations rule scoring function uses an existing Associations rule model, that means an existing set of association rules, and one or more new data groups, for example purchase transactions, to answer questions such as:
  • For which articles the stock needs to be increased when a certain article goes on sale?
  • Which are good cross sell recommendations for customers who have article A and article B in their basket?
  • Which transactions are supported by none of the rules? That might be a hint for fraud or some other deviation from normal behavior.
The Sequence Rule mining function finds typical sequences of events in your data. Sequence Rule models contain various sequence rules. A sequence rule consists of a previous sequence in the rule body that leads to a consecutive item set in the rule head. The Sequence rule scoring function answers questions, such as:
  • You might want to mail a special offer for product A to 10000 customers. Which customers should be selected for getting best response rates? (Target Marketing)
  • You might have a car with a warranty repair. Components A and B were defective and had to be replaced. Should you proactively look at any other components because they are likely to break soon?
  • An online shopper has purchased books A and B within the last 6 months. In which books might he or she be interested next?

When a rule model is applied, the scorer input port expects a transaction table containing the items already present in the transactions used to train the rule model. The output port has a fixed layout that contains the rules that match for these items.

Scoring clustering models

The clustering function groups data records on the basis of their similarity. When a model is applied, the scorer assigns a cluster ID, a cluster score, a quality value, and a confidence value to each individual record being scored. The cluster score, quality value, and confidence value are different measures that indicate how well the record fits into the assigned cluster. For example, the cluster ID can then be used to send a personalized mailing to only the customers with a specific cluster ID value based on the characteristics of this cluster.

Scoring classification models

Classification is the process of automatically creating a model of classes from a set of records that contain class labels. The classification technique analyzes records that are already known to belong to a certain class, and creates a profile for a member of that class from the common characteristics of the records. You can then use the scorer to apply this model to new records, that is, records that have not yet been classified. This enables you to predict if the new records belong to that particular class. When a classification model is applied, the scorer assigns a class label and a confidence value to each individual record being scored.

Scoring regression models

Regression is similar to classification except for the type of predicted value. Classification predicts a class label; regression predicts a numeric value. Regression also can determine the input fields that are most relevant to predict the target field values. The predicted value might not be identical to any value contained in the data that is used to build the model. An example application is customer ranking by expected profit. When a regression model is applied, the scorer assigns a predicted value to each customer being scored.



Feedback