Propensity Scores

For models that return a yes or no prediction, you can request propensity scores in addition to the standard prediction and confidence values. Propensity scores indicate the likelihood of a particular outcome or response. The following table contains an example.

Table 1. Propensity scores
Customer Propensity to respond
Joe Smith 35%
Jane Smith 15%

Propensity scores are available only for models with flag targets, and indicate the likelihood of the True value defined for the field, as specified in a source or Type node.

Propensity Scores Versus Confidence Scores

Propensity scores differ from confidence scores, which apply to the current prediction, whether yes or no. In cases where the prediction is no, for example, a high confidence actually means a high likelihood not to respond. Propensity scores sidestep this limitation to enable easier comparison across all records. For example, a no prediction with a confidence of 0.85 translates to a raw propensity of 0.15 (or 1 minus 0.85).

Table 2. Confidence scores
Customer Prediction Confidence
Joe Smith Will respond .35
Jane Smith Won't respond .85

Obtaining Propensity Scores

  • Propensity scores can be enabled on the Analyze tab in the modeling node or on the Settings tab in the model nugget. This functionality is available only when the selected target is a flag field. See the topic Modeling Node Analyze Options for more information.
  • Propensity scores may also be calculated by the Ensemble node, depending on the ensemble method used.

Calculating Adjusted Propensity Scores

Adjusted propensity scores are calculated as part of the process of building the model, and will not be available otherwise. Once the model is built, it is then scored using data from the test or validation partition, and a new model to deliver adjusted propensity scores is constructed by analyzing the original model’s performance on that partition. Depending on the type of model, one of two methods may be used to calculate the adjusted propensity scores.

  • For rule set and tree models, adjusted propensity scores are generated by recalculating the frequency of each category at each tree node (for tree models) or the support and confidence of each rule (for rule set models). This results in a new rule set or tree model which is stored with the original model, to be used whenever adjusted propensity scores are requested. Each time the original model is applied to new data, the new model can subsequently be applied to the raw propensity scores to generate the adjusted scores.
  • For other models, records produced by scoring the original model on the test or validation partition are then binned by their raw propensity score. Next, a neural network model is trained that defines a non-linear function that maps from the mean raw propensity in each bin to the mean observed propensity in the same bin. As noted earlier for tree models, the resulting neural net model is stored with the original model, and can be applied to the raw propensity scores whenever adjusted propensity scores are requested.

Caution regarding missing values in the testing partition. Handling of missing input values in the testing/validation partition varies by model (see individual model scoring algorithms for details). The C5 model cannot compute adjusted propensities when there are missing inputs.