Hidden within your IBM® WebSphere® Commerce database are golden nuggets of information, patterns not visible with simple analysis. This information helps you understand your customers better and targets marketing activities more accurately. Data mining is the science of extracting such knowledge from large amounts of data.
Data mining is a multidisciplinary field with many techniques. Association rules are one of the techniques used in data mining, and particularly useful with e-commerce transactional information. Association rules describe events that tend to occur together. They are formal statements in the form of X=>Y, where if X happens, Y is likely to happen.
A classic example describes which products tend to be purchased together at a supermarket. You might already expect that when a shopper purchases hotdogs, that shopper most likely purchases hotdog buns too. But what other groups of items do shoppers buy together that are not anticipated by the marketing team? An often-quoted example is that diapers and beer tend to be purchased together in the U.S.
e-commerce sites can clearly benefit from strong association rules. You can use the results from association rule mining to set up bundles of packages that customers tend to buy together. You can also use them to set up cross-sell associations, where a customer who buys one item sees an ad for another item that is related.
You can implement association rules mining against the WebSphere Commerce operational database, or better, against a WebSphere Commerce Analyzer data mart. When running against a data mart, the production system is not impacted and historical data is available to help identify trends over time.
Mining association rules require two pieces of data, the transaction and what was bought in that transaction. In WebSphere Commerce, you can use the ORDER_ID and PRODUCT_ID fields that are stored in the WCA.FACT_ORDERITEMS table in the WebSphere Commerce Analyzer data mart. There are two measures related to an association rule:
- Confidence: A purchase of one item implies the purchase of another. For example, of all the people who buy a toothbrush, 48% also buy toothpaste in the same transaction. 48% is the confidence of the rule.
- Support: Two items are purchased together. For example, 0.135% of all the shoppers buy toothpaste and toothbrushes at the same time. 0.135% is the support of this rule. Support for a specific case is generally below because this takes into account all shoppers and all transactions.
To set up a cross-sell marketing campaign, we are most interested in the confidence of the association rule, and this is what we will focus on in our example.
You need the following software to use the steps in this tutorial:
- DB2® Intelligent Miner for Data V8.1. DB2 Intelligent Miner includes six different mining functions, of which association rules is one.
- Microsoft® Windows® 2000 and up
- WebSphere Commerce Analyzer data mart V5.6
The following steps will guide you through the process of using DB2 Intelligent Miner (hereafter called Intelligent Miner) to define data that you can analyze and to define a mining model you can apply to the dataset. Intelligent Miner will derive some association rules you can use to further understand a customer's buying habits.
Log in to DB2 Intelligent Miner
- Start DB2 Intelligent Miner for Data by clicking on the menu Start -> Programs -> IBM DB2 Intelligent Miner for Data V8.1 -> Intelligent Miner. A preferences window appears where you can connect to your database. This tells Intelligent Miner where to look for the data.
- Click on the Database Logon tab, and then fill in your database name, user id, and password.
- Click Add, then Apply, and finally OK.
- When you open Intelligent Miner again, you only have to supply the password for the entry you just created.
Figure 1. Database logon
Intelligent Miner needs to have a description of the data being analyzed. From the Create menu, choose Data. In the wizard, do the following steps:
- On the Data format and settings page, click Database Table/View and enter a Settings Name you can recognize later, such as "AssocOrder", and optionally a comment. Click Next >.
- On the Database table or view, choose your WebSphere Commerce Analyzer data mart name from the Database server drop-down. Select the WCA schema in the left-hand list, and then select FACT_ORDERITEMS in the right-hand list. Click Next >.
- On the Field parameters page, change the data type of some of the fields. This does not affect the table itself, only in the way Intelligent Miner views the data. Although the values are numeric, association rules mining requires the values to be categorical. Because the numeric fields are really integers, you can re-type them as categorical and the mining will work. For each of the PRODUCT_ID, ORDER_ID, and MEMBER_ID fields, click the Data type drop down and choose Categorical. Click Next >.
- On the Computed fields, no changes are required. Click Next >.
- On the Summarypage, click Finish.
A mining model describes the data that you will use, as well as other input parameters necessary for the model to run. From the Create menu, choose Mining. In the wizard, do the following steps:
- On the Mining functions and settings page, choose Associations and provide a Settings name, such as "OrderAssoc". Click Next >.
- On the Input data page, click on the AssocOrder input data defined in the previous section. Click Next >.
- On the Input fields page, find ORDER_ID in the list of Available fields and click the top-most arrow to move it into the Transaction field. Find PRODUCT_ID and click the lower arrow to move it into the Item field.
Click Next >.
Figure 2 shows an example of this screen.
Figure 2. Input fields tab
- On the Parameters page, leave these as the defaults for the first run. If not enough rules are returned when you run the model, adjust the Minimum confidence down to 20 or 15. You can set the Maximum rule length if too many items are returned in a rule. For example, if it determines that a group of 50 products is bought together, but you want to bundle in sets no bigger than 5 items. Click Next >.
- On the Available results page, give a Results name, such as "AssocOrder1". Until you get the model the way you like it, check the box for If a result with this name exists, overwrite it. Click Next >.
- On the Summary page. Click Finish.
Now that you defined the model, you can run it.
- Expand the Mining folder on the left-hand side of the window, and then click on the Association folder.
- Select the mining model and click on the green arrow in the toolbar to run the model. A progress window appears. On a typical Windows machine, you can process 200,000 order items in less than a minute.
Results are shown in a separate window. Figure 3 shows an example of a set of rules found by mining.
Figure 3. Visualizing the rules
In this case three association rules are discovered. You need to go back to the PRODUCT_REF table to find out what the product IDs represent. In this case the first two rules are related. They are the same two products implying the purchase of each other. It turns out that these are children's cough syrup, identical products except for the flavor. Perhaps a number of parents are hoping that by providing a choice, they can more easily convince their child to take the cough syrup. From this conclusion, you may decide to mention this as a "tip" on the cough syrup category page. You can offer a bundle of two flavors of syrup at a slightly reduced price to up-sell parents who intend to buy only one flavor when they came to the site.
In the second case, the two products are also cold remedies, but this time for adults. The first is for sinus congestion and the second is a hot drink for coughs and fevers. If the store offers a product that deals with both, the store could decide to advertise it on the product display pages for these two products. A shopper who has sinus congestion may go straight to the product he or she knows that exists, without realizing there is a newer option. A solution is to cross-sell ads for the two products on those pages to make sure that every buyer knows both products are carried by the store.
Figure 4 shows an example of the Item Sets.
Figure 4. Visualizing the item sets
Item Sets, in this case, are products that are purchased together that meet the minimum support and confidence. This view is sorted descending by Items in Set, to show the sets with multiple items at the top. Many single-item sets are generated by the algorithm, which you can ignore for now. They are roughly analogous to your most popular products that you can determine without data mining.
These sets are all opportunities for bundling of products, or for merchandising associations in WebSphere Commerce. For example, one of the rules above says that Vitamin C and Vitamin E are often purchased together.
Everything you have done, including the data, the model, and the results, is collectively known as a Mining Base. You can save this base with a meaningful name, such as AssocOrders, and load it later to continue working with it. You can also export the mining base if you want to move the results to another machine or to allow someone else to examine the results.
The following notes describe some possible variations to the mining model you created using the steps in this tutorial.
Treating each shopper as a transaction
You can use the instructions above to create a new mining model where the transaction id is MEMBER_ID rather than ORDER_ID. This treats each shopper's purchases as a single transaction, allowing you to analyze purchases over time. For example, it may be useful to know that purchases of Vitamin E and an iron supplement tend to be made by the same customer, but not necessarily in the same purchase. You can then detect that a customer logging in to the site has bought Vitamin E, but not iron the past, and display a coupon for iron because that is the strongest association.
Every time you run a mining model, take a moment to open it up and give a new name for the result set. You can click on the Results folder and it shows all the results you have generated, without having to re-run the model each time. If you are going in to Intelligent Miner purely to look at results, you can click Cancel to skip the initial database logon window.
What if I have several million transactions?
If it is taking more time than desired to run the mining model, you can use Intelligent Miner to create sample sets for you. Use the Create menu to create Processing, which allows you to choose the Get Random Sample function to set a percentage of the data you want to analyze. You can use the results from the processing as input to the mining model.
In WebSphere Commerce, a merchandising association consists of a Source Catalog Entry and one or more Target Catalog Entries. You can express each rule found in the data mining as a merchandising association.
You have learned that data mining with association rules is a useful way to identify related items in your customers' shopping carts. Using a WebSphere Commerce Analyzer data mart and DB2 Intelligent Miner for Data, you can understand, interpret, and forecast interesting pieces of information about your customers buying habits. You can use this skill to enhance your marketing and merchandising activities.
- DB2 Intelligent Miner. Access to product and library information, trials and betas, training, and more.
- WebSphere Commerce Analyzer. Access to
product information and more.
- developerWorks WebSphere Commerce zone.
Access to WebSphere Commerce how-to articles, downloads, tutorials, education, product information, and more.
- WebSphere Commerce V5.6 Information Center.
View the online help for WebSphere Commerce 5.6 without installing it.
- Browse for books on these and other technical topics.
- WebSphere forums.
Product-specific forums where you can ask questions and share your opinions with other WebSphere users.






