Delivering association modeling data mining recommendations

IBM Cognos links predictive analytics and business decision makers

In most organizations, data mining is done separately from reporting. Often, the data mining team develops great analytical and predictive models but has no way to deploy them directly to the business processes and people. In this article, learn to integrate the general output of an association data mining model into an IBM® Cognos® reporting environment. Using Cognos, the reporting team can make the association models' recommendations easily available to business people in a familiar reporting environment.

Share:

David Gillman, Director, Services, Data Sooner, LLC

Photo of David GillmanDavid Gillman has worked in the areas of business intelligence, data mining, and predictive analytics for 20 years. His educational background is in applied math, optimization, and statistical analysis, with particular emphasis on applications to commercial activities. He has hands-on experience in improving business operations through applied analytics in the distribution, manufacturing, retail, and hospitality industries with companies of various sizes.



14 November 2013

Also available in Russian

One common problem when you work with predictive analytics is methods to present predictions and recommendations for action in formats that business people can easily access.

As companies move more into the realm of predictive analytics, IBM® Cognos® needs to present this information in easily digestible formats. Many predictive models do not lend themselves to traditional reporting tools. Fortunately, association models do and are a common starting place for predictive analytics teams to start working on business problems.

Common applications of association models include cross-selling and up-selling products and services to customers—concepts that apply to most commercial enterprises.

Association mining for Cognos professionals

Imagine a situation where your company wants to sell more to its existing customers. Sometimes, this drive takes the form of selling to customers more of what they are already buying. But many times, the goal is to sell other, extra items or services. Here is where association models can make targeted recommendations, recommendations that salespeople and automated sales processes use to up-sell or cross-sell more "stuff" to customers.

In physical goods, associated products or add-ons to primary products make sense. Think of an ice cream shop: That store probably wants to sell whipped cream and fudge topping to the customers who buy the ice cream.

The well-known, large Internet retailers use association models to make recommendations to all their customers. Most people who shop on Internet sites see suggestions on the web page to purchase more items that are based on items in the online shopping cart. Smaller websites tend to build those recommendations from just a list of top products. Medium-sized and large web companies usually put a significant amount of time and effort into creating models of purchasing patterns that are based on products that customers order. Add a dash of demographic information about the customer, and the website automatically tailors its recommendations to a short list of products that people who bought the first product are also buying.

Likewise, in services, think of a bank that wants to sell loan products to customers who have savings accounts. Or a pest-control company that wants to sell termite monitoring services to customers who use their other pest services. The examples are endless.

Other potential uses of association models in real-world business are for warehouse planning. Association models can indicate products that are often paired for retrieval from the warehouse to plan layouts to minimize time and movement.

Aside from tactical sales uses, marketing activities benefit from association models, as well. Consider what happens when a product is made part of a marketing campaign. Association models aid marketing decision making by showing which other products will probably sell in conjunction with the targeted items.

Association model concepts

Association models are one of the easiest to construct, easiest to understand, and easiest to implement predictive analytics methods for making these cross-sell recommendations where there are many products and many customers. The algorithms for constructing association models range from fairly simple statistical processes to complex machine learning-based methods. Regardless of methodology, the net output of models that are designed to increase sales is a recommendation to purchase an item that is based on what the customer is purchasing or previously purchased.

A common business nickname for association models is market basket analysis. Whatever the name, however, the basic concept for every item or service that the company sells is to identify the top items or services that other people buy with that primary product. Models can (and should) become more complex and break the customers into size, types, regions, or other characteristics, and then run the association models for products in each customer category. Modeling these characteristics produces more targeted, more specific recommendations that the customer is more likely to purchase. In this way, association models are light years beyond a "top-10 products" list and the classic "managers' hot products" list.

Without this article becoming a primer on statistics and data mining, a simple way to think of association models is as a running tally of which products tend to be ordered in pairs.

The algorithms usually assign probabilities that, given that product A is in the order, what is the probability that product B will also be purchased? The algorithm also looks at it the other way: If product B is the first product, what is the probability of product A being in the order? Most association model algorithms work with those two probabilities to come up with the final numbers.


Incoming data formats

Many predictive analytics processes create models that are not easily documented or transferred from the predictive analytics environment. Fortunately, association model rules are easily put into human-readable format. Common ways to exchange the rules files are text files or spreadsheets.

Figure 1 shows an example of the basic rules that come from the predictive analytics modelers. This example is a text file with the item numbers and the most basic statistics regarding the relationship (or rule) between the two items.

Figure 1. Text file of incoming rules
Screen capture showing a text file of incoming rules

The basic statistics are usually probability and lift for each rule. Probability is a number less than 1. It is computed from the starting point that if product A is in the order, what is the likelihood that the same order contains product B? Most people recognize this term as the percentage chance.

Lift is a more complex measure. The predictive analytics modelers are the ones who use and filter on lift. The short definition is that lift is a measure of the performance of the rule that is based on the conditional probabilities of the items in the rule that are sold together.

Figure 2 shows similar data to Figure 1 but with more human-readable descriptive fields added in. This format is easier to work with: The report fields are easier for the report developer and business consumer to read and understand.

Figure 2. Example spreadsheet of association model rules
Screen capture showing a spreadsheet of association model rules

Click to see larger image

Figure 2. Example spreadsheet of association model rules

Screen capture showing a spreadsheet of association model rules

So, you have the data, but where do you store it for reporting?

The format and location depend on your environment and architecture. To Cognos, it is just another data source. Common ways are flat files (for example, XML or comma-separated values), database tables, and spreadsheets. Depending on your predictive analytics team, they might choose to keep the rules in a server environment specific to the predictive analytics process. IBM SPSS® Modeler even has specific functionality to deploy models to Cognos. However, most of the time, an organizational divide exists between the two groups that translates to an architectural divide. The different groups typically feel better about just exchanging files and handshakes.

One consideration for data storage is future access by programs other than Cognos. Eventually, the recommendations are used like the medium-sized and large web companies do now: automatically in the flow of normal operations. Several different systems, such as web carts, enterprise resource planning (ERP) systems, and mobile applications access the recommendations and display them in their own programs. Consider that when you are making your decision.


How will the association rules be used?

Reporting is the first step, and automated processes come later. First, you need to provide a Cognos report that lists cross-sell recommendations by product. This report requires the descriptive fields for the business people. Eventually, the goal is to deploy these rules to the point of contact with the customer.

For web use, think of online retailers that insert recommendations onto almost every page of their website, including the web cart. Less visible to the customer is including recommendations in the ERP order entry or customer relationship management application screens. Your customers do not view these recommendations, but your salespeople who speak with the customers will. These recommendations flow right into the conversation that the salespeople have so that they can make intelligent suggestions to cross-sell and up-sell the customer.

The marketing department also use the recommendations. Most of their use is these reports and transfers to spreadsheet format. Typically, Cognos reports supply the information that marketing takes into spreadsheets, where it is combined with other information to support their analysis.


Simple report example

This example assumes that the predictive analytics team sent the rules file and that someone placed that file into a database table for reporting. Being an experienced Cognos developer, I created the connection to that data source and included it in my project in IBM Cognos Framework Manager.

Figure 2 showed the rules file with fields added to SPSS Modeler rules. This format gives the item more human-understandable characteristics. This file is the one that loaded into my database in this example. Having the human-readable fields helps in development and testing the report.

In Figure 3, you see that I created a project in Cognos Framework Manager and gave the project the name CognosAssociationModelReporting. Everything is organized in a single namespace called DataConnection. The first query subject is the connection to the database table with the recommendation information.

Figure 3. The initial data source in Cognos Framework Manager
Screen capture showing the initial data source in Cognos Framework Manager

The one incoming file, called CognosRecommendations, is a simple setup that has the name of the database table as its name.

The first step is to create a metadata layer over the incoming data in the CognosRecommendation table. I rename the data table layer to Incoming Data Layer, as shown in Figure 4. Next, I add a namespace under the DataConnection and name it Standardization Layer. Here, I rename the fields and pick out only those fields that people will want to see.

Figure 4. The Standardization Layer added to the project
Screen capture showing the Standardization Layer added to the project

In the Standardization Layer, the fields can be broken into four categories:

  • Fields to describe the selected item in the rule
  • Fields that describe the recommended item in the rule
  • Relevant statistics and ranking for the rule
  • A field for the group or type of recommendation

This last field does not come from the association model directly. It is included as several different association models are computed, and then their rules are merged into this single rules table. Table 1 shows how the fields map from the Incoming Data Layer to the Standardization Layer.

Table 1. Field mapping from the database table to reporting-friendly names
Incoming Data LayerStandardization Layer
ITEMSelected item number
RecommendationRecommended item number
Selected item classSelected item class
Selected item descriptionSelected item description
Recommended item classRecommended item class
Recommended item descriptionRecommended item description
RankingRanking
GroupGroup
ProbabilityProbability

The Standardization Layer field names are essentially the same as the database table names in this example. The names might not always be the same because the predictive analytics modelers probably have slightly different terms. Hence, the Standardization Layer serves well to do the translation from the technical environment to the business users' nomenclature.

Next, I create a package by using all of the fields in the Standardization Layer (see Figure 5). Then, I deploy that package.

Figure 5. Designating items into the package
Screen capture showing items designated into the package

Now, I build the first report. This report serves as the final report in a series of simple drills-downs. From the Cognos Workspace home page, I click Author Advanced Reports and use the package that I just created.

As shown in Figure 6, I place four fields onto the report that is displayed to the report consumers:

  • Recommended item description
  • Recommended item number
  • Ranking
  • Probability
Figure 6. Initial report layout
Screen capture showing the initial report layout

Click to see larger image

Figure 6. Initial report layout

Screen capture showing the initial report layout

I add a filter, as shown in Figure 7, that prompts the report consumer to enter a product. For ease of viewing, this example displays product descriptions. The list contains all products that have a recommended item to cross-sell. It displays a simple drop-down list box from which the report consumer can choose the primary item on the prompt screen (in the example data, it is called the selected item).

Figure 7. Filter on the report
Screen capture showing the filter on the report

Click to see larger image

Figure 7. Filter on the report

Screen capture showing the filter on the report

Figure 8 shows that the title displays the primary (selected) item description. This information is important and reminds the businessperson which product this list of recommendations is for. It is valuable when this list is printed for future reference.

Figure 8. Report with title
Screen capture showing the report with title

The report is sorted on probability. The probability field is typically the easiest sort field of the statistics for businesspeople to understand because it is displayed as a percentage. The prompt page is shown in Figure 9.

Figure 9. The prompt page
Screen capture showing the prompt page

Click to see larger image

Figure 9. The prompt page

Screen capture showing the prompt page

The final report is provided in Figure 10.

Figure 10. The final report
Screen capture showing the final report

Click to see larger image

Figure 10. The final report

Screen capture showing the final report

This simple example is a good start: It allows quick access to recommendations. With just a bit of tweaking, the report can conform to the company's identity standards before deployment into the live production environment.

Cascading reports for business people

Rather than directly picking a product, I create a series of reports to drill down to the example report. This way, the business user does not scroll through a massive product list; instead, with just a few clicks, the user can target the primary product to make the recommendations.

Using the same metadata layer and package, I create two reports that drill through, the first to the second, and then to the example report in Figure 10. I followed this process:

  1. From the Cognos home page, click Author Advanced Report.

    I create the second report first because it is in the middle of the three-report stack.

  2. From the package, place the Selected Item Description field onto the report.

    This field is the only field shown.

  3. Create a filter on Selected Item Class as shown in Figure 11.
    Figure 11. Filtering on Selected Item Class
    Screen capture showing how to filter on Selected Item Class

    Click to see larger image

    Figure 11. Filtering on Selected Item Class

    Screen capture showing how to filter on Selected Item Class

    Do not include this field on the report.

  4. Highlight the Selected Item Description column.
  5. In the properties box, select the Drill-Through Definition property.
  6. Edit to drill through to the first report.

    As shown in Figure 12, associate the value of the click in this report with the input value for the filter in the first report.

    Figure 12. Passing the Item Description value
    Screen capture showing how to pass the Item Description value

    Click to see larger image

    Figure 12. Passing the Item Description value

    Screen capture showing how to pass the Item Description value

    Figure 13 shows the completed Drill-Through Definitions box.

    Figure 13. The completed Drill-Through definition
    Screen capture showing the completed Drill-Through definition
  7. Save and test the report.

The list of item classes is shown on the prompt screen. I select one of these classes from the list, and then click Finish. A list of item descriptions that are in the selected class is displayed. When I click one of the items, the first report runs with the selected item as the primary item. The final screen shows recommendations to make.

As with the first report, I make it match the company's standards.

Now, I create a report that lists the item classes. This report is the entry report in the stack that the business user runs. Using this report as the front, the business user is not required to use any of the drop-down boxes on the prompt pages. (Although doing so might still be necessary if your company has many categories or items within categories. Using a search prompt might still be a cleaner way for users to select classes and products if many choices exist.)

I followed this process:

  1. From the Cognos home page, click Author Advanced Reports.
  2. Select the same package as the other reports.
  3. Drag the Selected Item Class field onto the report, as shown in Figure 14.
    Figure 14. Entry report layout
    Image showing the entry report layout
  4. Highlight the Selected Item Class column.
  5. Select the Drill-Though Definitions line in the column properties box.
  6. Select the second report in the stack—the one just finished.
  7. In the Parameters section, map the Item Class from this report to the prompt value in the second report, as shown in Figure 15.
    Figure 15. Parameter mapping in the Drill-Through Definitions property
    Screen capture showing parameter mapping in the Drill-Through Definitions property

    Click to see larger image

    Figure 15. Parameter mapping in the Drill-Through Definitions property

    Screen capture showing parameter mapping in the Drill-Through Definitions property
  8. Save and test the report.

Instead of a prompt page, you see a list of Item Classes. I click one of those classes: That value is now sent to the second report, which displays a list of item descriptions in the selected class.

Finally, when I select an item description, the final report in the stack displays items to cross-sell.

Using this report in real life

You can use this stack of reports as the basis for an active recommendation system. Imagine the situation of a salesperson who reviews orders with a customer on the phone. Running the first report, the salesperson concentrates on an item class that the customer is ordering. Then, by clicking the exact item that is already on the order, the salesperson is presented with an ordered list of recommendations to make.

When you design packages in Cognos Framework Manager, include the descriptive fields that users need to understand the hierarchy of products. Having more than initially needed might save time in the future when business users want to have a different path to drill into and get to the recommendations.


Moving to a big data future

Big data is the probable future location of much of the world's data. Websites in particular will generate massive amounts of data about clicks and navigation that do not need to be brought into a relational database. Applying association rules in this environment can yield many potential uses.

Consider the situation of keeping all the website click and navigation data in IBM BigInsights™. Offline, your predictive analytics team can create models that predict which products users will click next based on the page they're viewing. This model is one that association rules can generate. Applying these rules requires a real-time connection with the data stream. Next-page or product recommendations can be shown to users in real time as they browse the website.

Cognos can report on the activity and serve to monitor the effectiveness of the association rules models as they are applied to big data within BigInsights. The base knowledge that you acquire as you work with reporting on the rules is useful in creating and reporting on metrics in big data.


Conclusion

As data analysis evolves, predictive analytics will become increasingly important. Not every predictive analytics model lends itself to reporting for business consumption. For those models that are human readable, Cognos is a great reporting platform for delivering information that you can act on. The skills that are required are similar to historical reporting. With some knowledge upgrades, a skilled Cognos user can create reporting packages and reports that effectively disseminate predictive analytics models to affect and improve business operations.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics
ArticleID=951869
ArticleTitle=Delivering association modeling data mining recommendations
publish-date=11142013