Tracking the Expected Number of Customers Retained

Once satisfied with a model, you want to track the expected number of customers in the dataset that are retained over the next two years. The null values, which are customers whose total tenure (future time + tenure) falls beyond the range of survival times in the data used to train the model, present an interesting challenge. One way to deal with them is to create two sets of predictions, one in which null values are assumed to have churned, and another in which they are assumed to have been retained. In this way you can establish upper and lower bounds on the expected number of customers retained.

Figure 1. Cox nugget: Settings tab
Cox nugget: Settings tab
  1. Double-click the model nugget in the Models palette (or copy and paste the nugget on the stream canvas) and attach the new nugget to the Source node.
  2. Open the nugget to the Settings tab.
  3. Make sure Regular Intervals is selected, and specify 1.0 as the time interval and 24 as the number of periods to score. This specifies that each record will be scored for each of the following 24 months.
  4. Select tenure as the field to specify the past survival time. The scoring algorithm will take into account the length of each customer's time as a customer of the company.
  5. Select Append all probabilities.
    Figure 2. Aggregate node: Settings tab
    Aggregate node: Settings tab
  6. Attach an Aggregate node to the model nugget; on the Settings tab, deselect Mean as a default mode.
  7. Select $CP-0-1 through $CP-0-24, the fields of form $CP-0-n, as the fields to aggregate. This is easiest if, on the Select Fields dialog, you sort the fields by Name (that is, alphabetical order).
  8. Deselect Include record count in field.
  9. Click OK. This node creates the "lower bound" predictions.
    Figure 3. Filler node: Settings tab
    Filler node: Settings tab
  10. Attach a Filler node to the Coxreg nugget to which we just attached the Aggregate node; on the Settings tab, select $CP-0-1 through $CP-0-24, the fields of form $CP-0-n, as the fields to fill in. This is easiest if, on the Select Fields dialog, you sort the fields by Name (that is, alphabetical order).
  11. Choose to replace Null values with the value 1.
  12. Click OK.
    Figure 4. Aggregate node: Settings tab
    Aggregate node: Settings tab
  13. Attach an Aggregate node to the Filler node; on the Settings tab, deselect Mean as a default mode.
  14. Select $CP-0-1 through $CP-0-24, the fields of form $CP-0-n, as the fields to aggregate. This is easiest if, on the Select Fields dialog, you sort the fields by Name (that is, alphabetical order).
  15. Deselect Include record count in field.
  16. Click OK. This node creates the "upper bound" predictions.
    Figure 5. Filter node: Settings tab
    Filter node: Settings tab
  17. Attach an Append node to the two Aggregate nodes, then attach a Filter node to the Append node.
  18. On the Settings tab of the Filter node, rename the fields to 1 through 24. Through the use of a Transpose node, these field names will become values for the x-axis in charts downstream.
    Figure 6. Transpose node: Settings tab
    Transpose node: Settings tab
  19. Attach a Transpose node to the Filter node.
  20. Type 2 as the number of new fields.
    Figure 7. Filter node: Filter tab
    Filter node: Filter tab
  21. Attach a Filter node to the Transpose node.
  22. On the Settings tab of the Filter node, rename ID to Months, Field1 to Lower Estimate, and Field2 to Upper Estimate.
    Figure 8. Multiplot node: Plot tab
    Multiplot node: Plot tab
  23. Attach a Multiplot node to the Filter node.
  24. On the Plot tab, Months as the X field, Lower Estimate and Upper Estimate as the Y fields.
    Figure 9. Multiplot node: Appearance tab
    Multiplot node: Appearance tab
  25. Click the Appearance tab.
  26. Type Number of Customers as the title.
  27. Type Estimates the number of customers retained as the caption.
  28. Click Run.
    Figure 10. Multiplot estimating the number of customers retained
    Multiplot estimating the number of customers retained

    The upper and lower bounds on the estimated number of customers retained are plotted. The difference between the two lines is the number of customers scored as null, and therefore whose status is highly uncertain. Over time, the number of these customers increases. After 12 months, you can expect to retain between 601 and 735 of the original customers in the dataset; after 24 months, between 288 and 597.

    Figure 11. Derive node: Settings tab
    Derive node: Settings tab
  29. To get another look at how uncertain the estimates of the number of customers retained are, attach a Derive node to the Filter node.
  30. On the Settings tab of the Derive node, type Unknown % as the derive field.
  31. Select Continuous as the field type.
  32. Type (100 * ('Upper Estimate' - 'Lower Estimate')) / 'Lower Estimate' as the formula. Unknown % is the number of customers "in doubt" as a percentage of the lower estimate.
  33. Click OK.
    Figure 12. Plot node: Plot tab
    Plot node: Plot tab
  34. Attach a Plot node to the Derive node.
  35. On the Plot tab of the Plot node, select Months as the X field and Unknown % as the Y field.
  36. Click the Appearance tab.
    Figure 13. Plot node: Appearance tab
    Plot node: Appearance tab
  37. Type Unpredictable Customers as % of Predictable Customers as the title.
  38. Execute the node.
Figure 14. Plot of unpredictable customers
Plot of unpredictable customers

Through the first year, the percentage of unpredictable customers increases at a fairly linear rate, but the rate of increase explodes during the second year until, by month 23, the number of customers with null values outnumber the expected number of customers retained.

Next