Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Data mining with association rules using WebSphere Commerce Analyzer

Eric Poulin (epoulin@ca.ibm.com), IBM WebSphere Commerce Developer Readiness
Photo: Eric Poulin
Eric Poulin is a software developer at the IBM Toronto Lab in Toronto, Ontario, Canada. He is the team lead for the WebSphere Commerce Analytics and Business Intelligence team.
Kirsten Rutherford (kirsten@intelliware.com), Intelliware Development Inc.
Photo: Kirsten Rutherford
Kirsten Rutherford is a project manager at Intelliware® Development Inc., Canada. She is the original author of this article.

Summary:  This short tutorial shows how you can create association rule mining against the WebSphere Commerce operational database or against a WebSphere Commerce Analyzer data mart where the production system is not impacted. You can use the results from association rule mining to set up bundles of packages that customers tend to buy together.

Date:  24 Nov 2004
Level:  Intermediate

Activity:  5691 views
Comments:  

Introduction

Hidden within your IBM® WebSphere® Commerce database are golden nuggets of information, patterns not visible with simple analysis. This information helps you understand your customers better and targets marketing activities more accurately. Data mining is the science of extracting such knowledge from large amounts of data.

Data mining is a multidisciplinary field with many techniques. Association rules are one of the techniques used in data mining, and particularly useful with e-commerce transactional information. Association rules describe events that tend to occur together. They are formal statements in the form of X=>Y, where if X happens, Y is likely to happen.

Why mine association rules?

A classic example describes which products tend to be purchased together at a supermarket. You might already expect that when a shopper purchases hotdogs, that shopper most likely purchases hotdog buns too. But what other groups of items do shoppers buy together that are not anticipated by the marketing team? An often-quoted example is that diapers and beer tend to be purchased together in the U.S.

e-commerce sites can clearly benefit from strong association rules. You can use the results from association rule mining to set up bundles of packages that customers tend to buy together. You can also use them to set up cross-sell associations, where a customer who buys one item sees an ad for another item that is related.

You can implement association rules mining against the WebSphere Commerce operational database, or better, against a WebSphere Commerce Analyzer data mart. When running against a data mart, the production system is not impacted and historical data is available to help identify trends over time.

Background

Mining association rules require two pieces of data, the transaction and what was bought in that transaction. In WebSphere Commerce, you can use the ORDER_ID and PRODUCT_ID fields that are stored in the WCA.FACT_ORDERITEMS table in the WebSphere Commerce Analyzer data mart. There are two measures related to an association rule:

  1. Confidence: A purchase of one item implies the purchase of another. For example, of all the people who buy a toothbrush, 48% also buy toothpaste in the same transaction. 48% is the confidence of the rule.
  2. Support: Two items are purchased together. For example, 0.135% of all the shoppers buy toothpaste and toothbrushes at the same time. 0.135% is the support of this rule. Support for a specific case is generally below because this takes into account all shoppers and all transactions.

To set up a cross-sell marketing campaign, we are most interested in the confidence of the association rule, and this is what we will focus on in our example.


Prerequisites

You need the following software to use the steps in this tutorial:

  • DB2® Intelligent Miner for Data V8.1. DB2 Intelligent Miner includes six different mining functions, of which association rules is one.
  • Microsoft® Windows® 2000 and up
  • WebSphere Commerce Analyzer data mart V5.6


Step-by-step instructions

The following steps will guide you through the process of using DB2 Intelligent Miner (hereafter called Intelligent Miner) to define data that you can analyze and to define a mining model you can apply to the dataset. Intelligent Miner will derive some association rules you can use to further understand a customer's buying habits.

Log in to DB2 Intelligent Miner

  1. Start DB2 Intelligent Miner for Data by clicking on the menu Start -> Programs -> IBM DB2 Intelligent Miner for Data V8.1 -> Intelligent Miner. A preferences window appears where you can connect to your database. This tells Intelligent Miner where to look for the data.
  2. Click on the Database Logon tab, and then fill in your database name, user id, and password.
  3. Click Add, then Apply, and finally OK.
  4. When you open Intelligent Miner again, you only have to supply the password for the entry you just created.
    Figure 1. Database logon
    Database logon

Creating the data

Intelligent Miner needs to have a description of the data being analyzed. From the Create menu, choose Data. In the wizard, do the following steps:

  1. On the Data format and settings page, click Database Table/View and enter a Settings Name you can recognize later, such as "AssocOrder", and optionally a comment. Click Next >.
  2. On the Database table or view, choose your WebSphere Commerce Analyzer data mart name from the Database server drop-down. Select the WCA schema in the left-hand list, and then select FACT_ORDERITEMS in the right-hand list. Click Next >.
  3. On the Field parameters page, change the data type of some of the fields. This does not affect the table itself, only in the way Intelligent Miner views the data. Although the values are numeric, association rules mining requires the values to be categorical. Because the numeric fields are really integers, you can re-type them as categorical and the mining will work. For each of the PRODUCT_ID, ORDER_ID, and MEMBER_ID fields, click the Data type drop down and choose Categorical. Click Next >.
  4. On the Computed fields, no changes are required. Click Next >.
  5. On the Summarypage, click Finish.

Creating the Mining model

A mining model describes the data that you will use, as well as other input parameters necessary for the model to run. From the Create menu, choose Mining. In the wizard, do the following steps:

  1. On the Mining functions and settings page, choose Associations and provide a Settings name, such as "OrderAssoc". Click Next >.
  2. On the Input data page, click on the AssocOrder input data defined in the previous section. Click Next >.
  3. On the Input fields page, find ORDER_ID in the list of Available fields and click the top-most arrow to move it into the Transaction field. Find PRODUCT_ID and click the lower arrow to move it into the Item field. Click Next >. Figure 2 shows an example of this screen.
    Figure 2. Input fields tab
    Input fields tab
  4. On the Parameters page, leave these as the defaults for the first run. If not enough rules are returned when you run the model, adjust the Minimum confidence down to 20 or 15. You can set the Maximum rule length if too many items are returned in a rule. For example, if it determines that a group of 50 products is bought together, but you want to bundle in sets no bigger than 5 items. Click Next >.
  5. On the Available results page, give a Results name, such as "AssocOrder1". Until you get the model the way you like it, check the box for If a result with this name exists, overwrite it. Click Next >.
  6. On the Summary page. Click Finish.

Running the model

Now that you defined the model, you can run it.

  1. Expand the Mining folder on the left-hand side of the window, and then click on the Association folder.
  2. Select the mining model and click on the green arrow in the toolbar to run the model. A progress window appears. On a typical Windows machine, you can process 200,000 order items in less than a minute.

Interpreting the results

Results are shown in a separate window. Figure 3 shows an example of a set of rules found by mining.


Figure 3. Visualizing the rules
Visualizing the rules

In this case three association rules are discovered. You need to go back to the PRODUCT_REF table to find out what the product IDs represent. In this case the first two rules are related. They are the same two products implying the purchase of each other. It turns out that these are children's cough syrup, identical products except for the flavor. Perhaps a number of parents are hoping that by providing a choice, they can more easily convince their child to take the cough syrup. From this conclusion, you may decide to mention this as a "tip" on the cough syrup category page. You can offer a bundle of two flavors of syrup at a slightly reduced price to up-sell parents who intend to buy only one flavor when they came to the site.

In the second case, the two products are also cold remedies, but this time for adults. The first is for sinus congestion and the second is a hot drink for coughs and fevers. If the store offers a product that deals with both, the store could decide to advertise it on the product display pages for these two products. A shopper who has sinus congestion may go straight to the product he or she knows that exists, without realizing there is a newer option. A solution is to cross-sell ads for the two products on those pages to make sure that every buyer knows both products are carried by the store.

Figure 4 shows an example of the Item Sets.


Figure 4. Visualizing the item sets
Visualizing the item sets

Item Sets, in this case, are products that are purchased together that meet the minimum support and confidence. This view is sorted descending by Items in Set, to show the sets with multiple items at the top. Many single-item sets are generated by the algorithm, which you can ignore for now. They are roughly analogous to your most popular products that you can determine without data mining.

These sets are all opportunities for bundling of products, or for merchandising associations in WebSphere Commerce. For example, one of the rules above says that Vitamin C and Vitamin E are often purchased together.

Saving your work

Everything you have done, including the data, the model, and the results, is collectively known as a Mining Base. You can save this base with a meaningful name, such as AssocOrders, and load it later to continue working with it. You can also export the mining base if you want to move the results to another machine or to allow someone else to examine the results.


Notes of interest

The following notes describe some possible variations to the mining model you created using the steps in this tutorial.

Treating each shopper as a transaction

You can use the instructions above to create a new mining model where the transaction id is MEMBER_ID rather than ORDER_ID. This treats each shopper's purchases as a single transaction, allowing you to analyze purchases over time. For example, it may be useful to know that purchases of Vitamin E and an iron supplement tend to be made by the same customer, but not necessarily in the same purchase. You can then detect that a customer logging in to the site has bought Vitamin E, but not iron the past, and display a coupon for iron because that is the strongest association.

Retention of results

Every time you run a mining model, take a moment to open it up and give a new name for the result set. You can click on the Results folder and it shows all the results you have generated, without having to re-run the model each time. If you are going in to Intelligent Miner purely to look at results, you can click Cancel to skip the initial database logon window.

What if I have several million transactions?

If it is taking more time than desired to run the mining model, you can use Intelligent Miner to create sample sets for you. Use the Create menu to create Processing, which allows you to choose the Get Random Sample function to set a percentage of the data you want to analyze. You can use the results from the processing as input to the mining model.

Merchandising associations

In WebSphere Commerce, a merchandising association consists of a Source Catalog Entry and one or more Target Catalog Entries. You can express each rule found in the data mining as a merchandising association.


Conclusion

You have learned that data mining with association rules is a useful way to identify related items in your customers' shopping carts. Using a WebSphere Commerce Analyzer data mart and DB2 Intelligent Miner for Data, you can understand, interpret, and forecast interesting pieces of information about your customers buying habits. You can use this skill to enhance your marketing and merchandising activities.


Resources

About the authors

Photo: Eric Poulin

Eric Poulin is a software developer at the IBM Toronto Lab in Toronto, Ontario, Canada. He is the team lead for the WebSphere Commerce Analytics and Business Intelligence team.

Photo: Kirsten Rutherford

Kirsten Rutherford is a project manager at Intelliware® Development Inc., Canada. She is the original author of this article.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Information Management
ArticleID=31950
ArticleTitle=Data mining with association rules using WebSphere Commerce Analyzer
publish-date=11242004
author1-email=epoulin@ca.ibm.com
author1-email-cc=
author2-email=kirsten@intelliware.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers