Association modeling example

Association modeling is an important technique in predictive analytics. With association models, you can predict which items are most likely to appear together, as well as the strength of the relationships between them. For example, you might identify the next likely purchase for a customer based on items in their current basket.

This example uses the data file retail_purchase_data.txt, which is distributed and installed with the product. The data consists of purchase information for electronics items such as televisions, computers, and smart phones.

A completed version of this example is also provided in the file retail_association_model.str. Contact your administrator for details on installing the sample files if necessary. See the topic Sample files for more information.

  1. Return to the application Launch page and create a new IBM® SPSS® Modeler Advantage project.
  2. On the Data tab, in the Project Data Sources panel, add a new data source called retail purchase data. Select File and select retail_purchase_data.txt.
  3. On the Modeling tab, click Change Model, select Association model, and click Save.
  4. For the data source, select retail purchase data.
  5. For the data format, select Tabular. A transactional data set would only show items that were included, and all items would appear in a single column. But in this data set, each item is in a separate column with a true or false flag to indicate whether it was purchased or not.
  6. Click Build Model. You will receive an error that says building failed and suggests that the thresholds may be too high. Close the error.

    When looking for association rules, it's usually a good idea to look for rules that are generally applicable, reliable, and simple.

  7. Expand the Optional Settings section. Under Association Rule Options, change the first threshold from 10 to 1, and lower the second threshold from 80 to 1. Normally, you would experiment with different thresholds, rebuilding the model each time, to strike a balance between not getting too many rules and not getting enough.
  8. Click Build Model.

    In the model results, each row represents a separate association rule. The first rule in the list says that if a customer purchases a Smart Phone, they're also likely to purchase a Big Screen TV. The Coverage (%) column shows that about 15% (or about 1 in 6 customers) has purchased the Smart Phone, and the Confidence (%) column shows that of those customers, there's about a 30% probability that they'll also purchase a Big Screen TV.

    So even with a small set of rules (239 in this example), you might be able to use them in a business context to predict or even recommend what customers will buy next.

  9. Next, click Test and then click Run to see how the predictions or recommendations will be made.

    In this case, the model looked at the rules for the highest confidence figures, looked at what the user has already purchased, and then on the basis of these two together, has made some recommendations about what customers are likely to purchase.

  10. Click the preview icon beside the first record, for example, to see that the first customer purchased a Big Screen TV and Speakers, but has not purchased any other products. Close the preview.

    The recommendation for this customer is to offer them a Standard TV, as seen in the second column. The third column's value, 0.3, means there's a 30% chance that customers who have already purchased a Big Screen TV and Speakers will also purchase a Standard TV.

    However, how do we know whether this is an anomaly or a predicted trend that can be generally applicable? Take note of the rule number (115 in this example) so we can go back and look at the rule in more detail.

  11. Close the Test dialog.
  12. Page through the model results until you see rule number 115. Note that for large models with many pages of results, you can also click Find a rule by ID and type the rule number.
  13. Click the arrow next to any one of the result column headers and select

    Columns > Coverage (N)

    The Coverages (N) column will be displayed. Based on the number of instances, it is clear this is not an anomaly because rule 115 is based on 58 different customers who have purchased the three products together. This is enough to give us confidence about using the rule as the basis for future recommendations.

    We have looked at a single rule to see how it might be used for a recommendation. But IBM Analytical Decision Management also provides powerful options for customizing the model to make it more relevant to our specific purposes. For example, we may want to predict which customers are interested in buying a Big Screen TV:

  14. Click Apply Filters. This displays a dialog box where you can define filters so that only rules matching the filters will be shown in the model.
  15. Select Enable filter and select Big Screen TV for the item.
  16. Click Save to return to the model results.

    The number of rules has dropped from 239 to 34.

    You might also want to confirm you're only using rules that are based on a reasonable number of customers. To achieve this, you might exclude rules that are based on the behavior of less than 10 customers:

  17. Click the arrow next to the Coverage (N) column header and select Sort Ascending.
  18. In the Exclude? column, select all rules with a Coverage (N) value of less than 10 instances.
  19. Note that even though many of the rules have now been filtered and excluded, they're still in the model at this point. To remove them permanently so they're not used when the model is scored, click Delete excluded & filtered rules and click OK.

    The number of rules returned is reduced to 20, with all of them based on 10 or more instances, and all predicting Big Screen TV.

  20. To see how many of these rules will be applied, click Test and then click Run. Big Screen TV is now being recommended for several more customers, and some have a higher confidence level than others.
  21. Close the Test dialog box.
  22. In the Optional Settings section, expand the Scoring Options section to display scoring options specific to association models.

    By default, the maximum number of predictions is 3. This will be based on the highest confidence figures, or you can change the rule criterion to choose rules based on coverage, rule support, the highest amount of lift (the increase in probability of a particular item being purchased), or deployability. You might also want to make sure not to offer a Big Screen TV to customers who already purchased one by selecting Ensure predictions no present.

    The default scoring options are suitable for most situations. But in some cases, the added flexibility these options provide may be advantageous.

    The next step in the process would be to use the Scoring tab to score the model. See the topic Scoring models to a database table, file, Analytic Server, or IBM Cognos BI server for more information.

  23. If desired, save the project as retail_association_model.str.

For more information about association modeling, see Building an association model.