Module 1: Finding relationships in your data with an association model

In this module, you will create an association model that you can use to perform market basket analysis on transaction data.

Term: market basket analysis: A technique of determining the products that customers are likely to purchase together.
In this module, you will use the data mining features of data warehousing in Db2® to apply market basket analysis to the transaction records of the fictional Sample Outdoors company. By discovering groups of products that are often purchased together, you can recommend to your customers products that they are likely interested in, based on products that they already purchased.

When you run your mining flow, you create an association model that is based on your source data. Then, you can use the Associations visualizer to look at the association rules that model the purchasing patterns of your customers.

Learn more about association models: An association model looks for patterns within your data by finding associations between items. An association model finds these patterns by using the formula "customers who purchase product A also purchase product B."

The following figure shows the diagram that you will build in this module to examine your source data for customer patterns. The lines represent the flow of data, and the boxes represent different operators that organize and process your data for market basket analysis. You can refer back to this diagram at any time while doing the lessons in the module.

Figure 1. The mining flow that you create in this module
Completed Market_basket_analysis mining flow for this tutorial

A mining flow specifies how to use your transaction data to generate your association model. You can design a mining flow by using the flow editor in Design Studio to draw a diagram of your data mining flow. The diagram represents the steps that must be completed to discover the type of information that you are seeking. You can place symbols on a canvas in the diagram to represent a data processing operation. By connecting the outputs of operators to the inputs of other operators, you can specify how your data moves through the processing steps of your mining flow.

Learning objectives

After completing the lessons in this module, you will know how to do the following tasks:
  • Design a complete mining flow that includes name lookup tables and taxonomy information
  • View your customer data for patterns with an association visualizer

Time required

This module should take approximately 90 minutes to complete.

Prerequisites

You must first complete the prerequisites described in Setting up the GSDB sample database.

Note: The steps in this tutorial are designed to work with a new version of the sample database and data. If you or someone else previously worked on this tutorial or another tutorial that uses the GSDB sample database, you should run the script again to reset the sample database.