Integrate InfoSphere Warehouse Data Mining with IBM Cognos Reporting, Part 4: Customer segmentation with InfoSphere Warehouse and Cognos

Bringing it all together

In the previous articles of this series, you learned different techniques for integrating InfoSphere Warehouse Data Mining and simple Cognos reports. This latest article teaches you how to use some of the same integration techniques to create a more complex report that focuses on the task of customer segmentation. Customer segmentation allows companies to cluster their customers into characteristic groups. One important issue in this task is to explain to the user the meaning of the individual customer segments. Interactive Cognos reports can help you do this. The article uses a step-by-step example, to teach you how to create a report that visualizes cluster statistics and, thus, allows you to find out which customers in a given segment are special. The article also shows you how to enable drill-through for accessing details of individual customers within a segment.

Share:

Benjamin G. Leonhardi (bleon@de.ibm.com), Software Engineer, WSO2 Inc

Author Photo: Benjamin LeonhardiBenjamin Leonhardi is a software engineer for InfoSphere Warehouse data mining at the IBM Research & Development Lab in Boeblingen, Germany. He works on mining visualization, text mining, and mining reporting solutions.



Christoph Sieb (christoph.sieb@de.ibm.com), Senior Software Engineer, WSO2 Inc

Author Photo: Christoph SiebChristoph Sieb is a senior software engineer at the IBM Research & Development Lab in Boeblingen, Germany. Christoph is responsible for data mining algorithms and framework-related issues of InfoSphere Warehouse data mining. He has more than seven years of experience as a developer and consultant in the area of business intelligence and data mining.



Dr. Michael J. Wurst (mwurst@de.ibm.com), Senior Software Engineer, WSO2 Inc

Author Photo: Michael WurstMichael Wurst, Ph.D. is a senior software engineer at the IBM Research & Development Lab in Boeblingen, Germany. He holds a Ph.D. in computer science and is responsible for data mining algorithms and tooling in InfoSphere Warehouse. Prior to joining IBM, Michael worked as a co-developer, architect, and consultant for the RapidMiner data mining software.



29 January 2009

Also available in Vietnamese

Introduction

Customer segmentation allows you to group customers into segments of mutually similar customers. To demonstrate why this could be useful, consider the following scenario. You gather data about the demographic aspects of your customers (age, profession, place of residence, etc.) as well as data about their transactions (items bought, contracts, etc.). An analysis of this combined data could reveal customer groups that you did not think about before, for example, elderly customers that spend a lot of money but do not use online shopping. Information about the typical behaviors of such groups could then be used by your marketing department to develop specialized products or services.

Technically, customer segmentation is a special case of a more general data mining task called data clustering. Data clustering automatically groups data records according to their properties or features into clusters of mutually similar data records. There are usually several different ways to define the underlying similarity measure, depending on your application domain. InfoSphere Warehouse provides a powerful mechanism that picks this similarity measure automatically.

This article provides a brief description of data clustering and customer segmentation methods and approaches in InfoSphere Warehouse. The article then discusses the general possibilities to explore customer segments and other clusters within Cognos. The last part of the article gives you a step-by-step example of how to create and explore segments in Cognos.


Customer segmentation using InfoSphere Warehouse

InfoSphere Warehouse provides several methods for clustering data records. By applying these algorithms to your customer records, you can find groups of customers that are similar to each other. An important first step to achieve this is to preprocess your data. Relevant customer data may be scattered among different tables or databases, for example, the database that contains demographic information may not be the same one that contains transaction data. If data is scattered, you need to first transform it so that you receive a single table with exactly one row per customer. InfoSphere Warehouse provides powerful tools for these kinds of transformations. The transformation tools are outside the scope of this article, but you can find details on how to use them in the InfoSphere Warehouse Information Center, which is linked to from the Resources section.

The sample data used in this article is a table with bank customers, as shown in Figure 1. This data has already been adequately transformed.

Once your data has been transformed, you can directly apply the data clustering methods in InfoSphere Warehouse. Two clustering algorithms are available:

  • Demographic clustering is especially well suited for data sets that contain a mixture of discrete and continuous fields (for example, profession and age). It is the default algorithm used by InfoSphere Warehouse.
  • Kohonen clustering is better suited for data sets containing only binary or continuous fields (for example, the number of items each customer purchased in a set of product categories).

In the example described in this article, clustering is applied by using the Clusterer operator within a mining flow. Alternatively you can directly invoke the following SQL statement:

call  IDMMX.BuildClusModel('BANK.CUSTOMERS_CLUS', 'BANK.CUSTOMERS');

In both cases, a cluster model is created in the table IDMMX.CLUSTERMODELS.


Extracting relevant information from the cluster model

Visualizing and analyzing cluster models is a crucial step in the process of integrating data mining and reporting. A cluster model consists of statistics that describe each cluster and allow you to assign a data record to exactly one of the clusters. There are several ways to analyze a cluster model:

  • Compare the statistics of a cluster to the global statistics. Individual clusters contain data records that are similar to each other and dissimilar to data records in other clusters. By comparing the statistics of a cluster to the global statistics you can determine which statistics are specific to a cluster. For example, if the average age of all customers is 40 and the average age of customers in a segment is 20, then you can conclude that the segment is characterized by containing younger customers.
  • Perform drill-through to individual data records. Analyzing individual data records as representatives of a cluster is a simple, but often powerful, way to get an idea of the data records that a cluster contains.
  • Define an application-specific cluster relevance measure. For example, you may only be interested in segments that contain more that ten thousand customers, with each customer spending at least a hundred dollars a month. Such a criterion would help to filter most of the clusters and make the analysis much more efficient, especially if there are many segments.

You can use the visualizer that comes with InfoSphere Warehouse to look at a cluster model (see Figure 4). The visualizer allows you to analyze the size and characteristic statistics of each cluster (customer segment). While this visualization is powerful, it only allows for comparing statistics, not for drill-through or for application specific criteria. Cognos, however, is very well suited for these kind of visualization tasks.

To enable a visualization of clusters in Cognos, or in any other customized way, the information within a cluster model must first be extracted in an appropriate form. With InfoSphere Warehouse, you can achieve this by using table extractor functions. You can find more information on these functions in the second article of this series, which is linked to from the Resources section.

Cluster models are stored in the Predictive Model Markup Language (PMML) format in the database. They contain information on:

  • The distribution of values in a cluster
  • Number of records in a cluster
  • Importance of variables for each cluster
  • Homogeneity of clusters
  • etc.

You can use stored procedures that come with InfoSphere Warehouse to extract this information into result sets that can be consumed by Cognos (or any other application that is able to access data through SQL). The second article of this series described how to use the IDMMX.DM_GETCLUSTERS method to extract basic information on each cluster. In addition to basic information, you can also extract statistics for each cluster as shown in the following command:

SELECT * FROM TABLE(IDMMX.DM_GETCLUSTERSTATS((SELECT MODEL FROM IDMMX.CLUSTERMODELS 
WHERE MODELNAME='IDMMX.OUTLIERMODEL'))) AS CT

The above command returns a table that contains the local and globally expected frequency for each combination of cluster, field, and field value. The next section of this article shows you how to use this information to create interactive Cognos reports for exploring clusters.


Exploring customer segments and other cluster models in Cognos

Visualizing typical statistics of clusters

Cognos provides several powerful widgets to visualize the distribution of data in a table. Especially important are pie charts for nominal values and histograms for continuous values. Later in this article you will learn how to create visualizations based on the cluster statistics tables extracted from the underlying mining models. Figure 12 shows an example of this type of visualization.

Implementing drill-through from mining model to customer data

Data Mining models contain quantitative and statistical information about the data they describe. They do not, however, contain the original data records. Because data mining is generally used to analyze huge sets of data, maintaining the individual records would not be efficient and in some cases not even possible. Nevertheless, it is very helpful at times to be able to see the data records that are represented by a specific cluster. This type of operation is called drill-through.

To enable drill-through from a customer segment to the data records belonging to the segment, you must select only the records belonging to a specific cluster. You can do this by scoring all customer records against the clustering model and returning only the customers that belong to the cluster. A similar approach is used in the Deviation Detection example described in the second article of this series. In theory, you could do the scoring on the fly as the user selects a cluster. However, even though cluster scoring is relatively fast, this would still take a long time. Therefore, you would usually rather precompute the information regarding which customer segment a given customer belongs to before the report is presented to the user.

The basic idea is that once the user selects a customer segment, a new report is opened that uses a simple WHERE clause to return all customers from the scored customers table that belong to the segment. This approach has several advantages. The time consuming mining and scoring processes can be done together before the report is shown to the user. The XML mining model is small and doesn't contain any customer data. The big data sets containing the scoring information are stored as DB2 tables and can be queried efficiently. For really big data sets, it may be necessary to add indexes to the scoring table.

Using customized cluster relevance metrics

Determining whether a cluster or a customer segment is interesting often requires knowledge of the application domain. Cognos makes it easy to introduce customized metrics that combine diverse information on clusters into a single relevance criterion. The ability to color rows in a Cognos report based on metrics is especially interesting in this context.


A practical example: bank customers

The following example is based on the table BANK.BANKCUSTOMERS that is one of the samples shipped with InfoSphere Warehouse. Sample data from this table is shown in Figure 1.

Figure 1. Sample data from the BANK.BANKCUSTOMERS table
Sample data from the BANK.BANKCUSTOMERS table.

The example shows you how to use InfoSphere Warehouse DesignStudio to create a mining flow that clusters the data records from this table into different customer segments. You can then use the generated cluster model to score the records in the table, which creates a new table named CUSTOMERSCORED that contains the original data with additional information about which segment each customer belongs to.

Follow these steps to create a new mining flow:

  1. Drag a Table Source operator to the editor.
  2. Double click on the Table Source operator to open its properties. Specify BANK.BANKCUSTOMERS as the Source Database Table and click OK.
  3. Drag a Clusterer operator to the editor and connect the Source Table operator to it.
  4. Double click on the Clusterer operator to open its properties. Change the name of the generated model to BANK.CLUSTER on the second property page and click OK.
  5. Drag a Scorer operator to the editor.
  6. Connect the output port of the Clusterer operator to the model input port of the Scorer operator.
  7. Connect the output port of the Table Source operator to the table input port of the Scorer operator.
  8. Right-click on the output port of the Scorer operator and select Create Suitable Table...
  9. On the first wizard page, select the schema BANK, enter the table name CUSTOMERSSCORED, and click Finish. This connects a Table Target operator to the flow.
  10. If you want to run the flow multiple times, double click on the Table Target operator to open its properties, select the Delete Previous Content check box, and click OK.
Figure 2. The mining flow used for customer segmentation
The mining flow used for customer segmentation.

(See a larger version of Figure 2.)

This mining flow creates the mining data that you will learn how to display in a Cognos report later in this article. The two results of the flow are:

  • The clustering model BANK.CLUSTER, which displays information about the customer segments.
  • The BANK.CUSTOMERSCORED table, which contains the original customer records with two additional columns named CLUSTER_QUALITY and CLUSTER_ID. CLUSTER_QUALITY indicates how good the record is fitting to this cluster. CLUSTER_ID is the id of the cluster to which the record is the best fit. The drill-through example in this article uses the cluster id to select customer records during a drill-through.
Figure 3. Sample data from the result table, BANK.CUSTOMERSCORED
Sample data from the result table, BANK.CUSTOMERSCORED.

(See a larger version of Figure 3.)


Creating an interactive customer segmentation report with Cognos

This section of the article shows you how to use Cognos to build a customer segmentation report. The report consists of a main page that shows the computed customer segments and has a link to a drill-through report that shows customer records for each cluster.

The main page contains information similar to the clustering visualizer that is shipped with InfoSphere Data Mining. It contains a list of customer segments or clusters and provides additional information about the fields of the cluster, for example, average bank balance. Pie charts display the distribution of important fields such as the bank card.

As in the previous articles of this series, the information about segments, fields etc. is extracted using the InfoSphere Warehouse Data Mining SQL API. The report joins the tables returned by the API to create the customer segmentation report. The second article of this series provides details on the linking of report pages.

Figure 4. The clustering visualizer of InfoSphere Warehouse
The clustering visualizer of InfoSphere Warehouse

Importing cluster data into Cognos

After creating the clustering model and the scored customer table, you can import both into Cognos metadata with the Cognos Framework Manager. For the customer segmentation report, you need to import and join three result sets:

  • Customers. The scored customer table that you created in the previous section. This is used for the drill-through to the customer data.
  • ClusterSegments. An overview of all clusters of the created clustering model, which is returned by the IDMMX.DM_GETCLUSTERS function. This returns a table with one row for each cluster and is the basis of the main report page.
  • ClusterSegmentStatistics. A table that contains the cluster statistics or information on the field distribution for each cluster, for example, the Age distribution for each cluster. This can be fetched with the IDMMX.DM_getClusterStats function of InfoSphere Warehouse. This table is used for the display of detailed field information for the clusters.

To begin creating the customer segmentation report, create a Cognos Framework Manager project that is connected to the sample database DWESAMP of InfoSphere Warehouse and has the BANK.CUSTOMERSCORED table. Detailed instructions on how to do this can be found in the first article of this series, which is linked to from the Resources section. It is good practice to create a query subject in the PresentationView namespace that contains the information you need from the database. This gives you an abstraction layer over the query subjects that are created from SQL statements.

Follow these steps to create the customer table query subject used by the report:

  1. Create a new query subject named Customers in the PresentationView namespace from the model (Existing query subjects and query items).
  2. Add all query items from the CUSTOMERSCORED query subject and change the names to be more descriptive. For example, "Customer segment" for the CLUSTER_ID.

Follow these steps to create the CustomerSegments cluster description query subject from the DB2 table function:

  1. Select the Database DWESAMP in the Data Sources folder of the Project Viewer and change the Query Processing property in the properties view to Limited Locale. This enables query subjects from SQL that are not known by Cognos.
  2. Create a new query subject named ClusterSegments in the PresentationView namespace and choose to model the query subject from a data source.
  3. On the "select a data source" page, select DWESAMP and clear the Run database query subject wizard check box (the query subject wizard only works for standard SQL), and click Finish.
  4. After the query subject is created, the Query Subject Definition wizard opens. Enter the SQL code to return the clusters from the model, where BANK.CLUSTER is the name of the clustering model generated during the clustering run.
    SELECT * FROM TABLE(IDMMX.DM_GETCLUSTERS((SELECT MODEL FROM IDMMX.CLUSTERMODELS 
    WHERE MODELNAME='BANK.CLUSTER'))) AS CT
  5. You need to set the SQL type of the query to Native, which tells Cognos to pass the SQL to the database instead of interpreting it. To do this, go to the Query Information tab of the query subjects properties. Select Options, and change the SQL type under the SQL settings tab to Native.
  6. A Test Sample run should return a table with the clusters of the model as shown below:
    Figure 5. The test results of the CustomerSegments query subject
    The test results of the outlier cluster query subject
  7. Repeat the previous steps for the ClusterSegmentStatistics query subject using the following SQL:
    SELECT * FROM TABLE(IDMMX.DM_getClusterStats((SELECT MODEL FROM IDMMX.CLUSTERMODELS 
    WHERE MODELNAME='BANK.CLUSTER')))

The above query subject contains information on the field distribution of each cluster:

  • The ID column contains the cluster id.
  • The FIELDNAME column contains the name of the fields that are in the model and can be used to query, for example, for statistics on the Age field.
  • The NUMVALUE column is used for fields that are of numerical values. For numerical fields the clustering algorithm creates a set of bins, for example, "Age 0-10, 10-20 ...". The NUMVALUE represents the edge of each bin.
  • The CATVALUE column is used for fields that are of categorical value. That means there are a fixed number of values in the field, for example, "Customer is using a bankcard: YES/NO".
  • The FREQUENCY column describes how many customers of this cluster are in the bin
  • The EXPECTEDFREQUENCY column describes how many customers would be in this bin if they were distributed evenly over all clusters.

You can use this table to add detailed information about customer characteristics to the Cognos report. For example, this article shows you how to use pie charts to represent a categorical field and bar charts to show the distribution of a numerical field.

Figure 6. The test results of the CustomerSegmentStatistics query subject
The test results of the CustomerSegmentStatistics query subject.

To create a report that joins the information between these subjects, you have to create relationships between the tables. The cluster id is always the join key.

Follow these steps to create the relationships between the tables:

  1. Select Create Relationship from the context menu of the Customer query subject.
  2. For the left query subject, select the CLUSTER_ID of the Customer subject and set the cardinality to 1..n. This indicates there are many records belonging to the same cluster.
  3. For the right query subject add the CustomerSegments query subject, select the ID column and set the cardinality to 1..1. This indicates there is one row for each cluster.
  4. Click OK.
  5. Repeat the previous steps to create a relationship between CustomerSegments and CustomerSegmentStatistics. The left join subject is CustomerSegments with cardinality 1..1. The right join subject is CustomerSegmentStatistics with cardinality 1..n.
  6. Repeat the previous steps to create a relationship between Customers and CustomerSegmentStatistics. The left join subject is Customers with cardinality 1..n. The right join subject is CustomerSegmentStatistics with cardinality 1..n.

Now you have created the query subjects you need for the Cognos report and can deploy a CustomerSegments package containing the PresentationView of the project to the Cognos Content Store. Instructions on how to create and deploy the package are described in the first article of this series.

Figure 7. The created resources in the Framework Manager
The created resources in the Framework Manager

Creating a customer segmentation report with Cognos Report Studio

This section describes how to create a report with three report pages that are linked together. For a more detailed description on how to link reports together refer to the second article of this series.

The three report pages are:

  • The main report page shows the customer segments and some key information about the segment, such as the average age, average balance, cluster size, etc.. The user can click on a cluster for a drill-through to that cluster's customer data. The user can also see a more detailed age distribution for the cluster by selecting the age field.
  • The customer drill-through page shows the records belonging to a cluster.
  • The age distribution page shows a bar chart of the age distribution of a cluster as compared to all customers.

The following steps describe how to build the two linked pages first. This is done so that they are available to link to when you create the main report page. The customer drill-through page is similar to the customer page created in the second article of this series and shows a table of the customer records as used in the Customers query subject filtered by the cluster id. The cluster id is added as parameter to this report.

Follow these steps to create the customer drill-through page:

  1. Create a new report in Cognos Report Studio using the CustomerSegments package.
  2. Add a list object to the report.
  3. Add the Customers query subject to the list by dragging it from the Insertable Objects view.
  4. Add a filter to the list by selecting the list and clicking the Filters icon on the toolbar or selecting Data->Filters from the menu.
  5. In the wizard use the Add icon to add a Details filter.
  6. On the Detail Filter Expression page, add the code shown below to the Expression Definition. (Customer segment is the cluster id, but you may have given it a different name.)
    [PresentationView].[Customers].[Customer segment] = ?clusterid?
  7. Cognos automatically detects the clusterid keyword that is surrounded by the ? characters, and adds it to the parameter list of the report.
  8. Click OK to confirm the filter.
  9. Change the report header text and column headers to make them more meaningful.
  10. Save your report as CustomerDrillthrough.
Figure 8. The Cognos Report Studio with the CustomerDrillthrough report
The Cognos Report Studio with the CustomerDrillthrough report.

(See a larger version of Figure 8.)

The second report page shows a detailed age distribution bar chart for a given cluster. To accomplish this, you use data from the CustomerSegmentStatistics query subject and filter the column FIELDNAME by AGE and the cluster id from a parameter. You then add two series to the column chart. One is for the age distribution of this cluster, which is in the query item FREQUENCY, and the other is for distribution of all clusters, which is in the query item EXPECTEDFREQUENCY.

Follow these steps to create the age distribution report page:

  1. Create a new report for the same package. You can do this by selecting File->New in the Report Studio. From the template list select Chart and use the first column chart type.
  2. From the CustomerSegmentStatistics query subject in the Insertable Objects view, drag the query items FREQUENCY and EXPECTEDFREQUENCY to the Series: field of the column chart.
  3. From the same query subject, drag the query item FREQUENCY to the Default Measure(y-Axis): field of the column chart.
  4. From the same query subject, drag the query item NUMVALUE to the Category (x-Axis): field of the column chart.
  5. Change the Aggregation function of the NUMVALUE field to None. One way to do this is to select the NUMVALUE field in the Category (x-Axis): field. When you see the properties of this field in the property menu on the lower left of the screen, find the Data Item->Aggregate Function field and change it to None.
  6. Change the axis descriptions to make them more meaningful. You can do this by opening the Axis Titles: field in the column chart and entering text for vertical and horizontal axis.
  7. Change the description of the series. Select each series one at a time. In the Properties section change Text Source->Source Type to Text and enter a description in the Text Source->Text field.
  8. Change the size of the chart. In the Properties view of the chart, select the ... button in the Positioning->Size and Overflow menu and change the width to 600 pixels.
  9. Add a filter to the chart by selecting the chart list and clicking the Filters icon on the toolbar or Data->Filters from the menu.
  10. In the wizard use the Add icon to add a Details filter.
  11. On the Detail Filter Expression page, add the following code to the Expression Definition:
    [PresentationView].[CustomerSegmentStatistics].[ID]=?clusterid? 
    AND [PresentationView].[CustomerSegmentStatistics].[FIELDNAME]='Age'
  12. Cognos automatically detects the clusterid keyword that is surrounded by the ? characters, and adds it to the parameter list of the report.
  13. Click OK to confirm the filter.
  14. Change the report header text to make it more meaningful.
  15. Save your report as AgeDistribution.
Figure 9. The Cognos Report Studio with the AgeDistribution report.
The Cognos Report Studio with the AgeDistribution report.

Now you are ready to create the main report page that links to the other two reports. The foundation for the main report page is a list of all clusters, or customer segments. The steps below tell you how to enrich each cluster row with a field of the average age and balance in the cluster, and add columns with pie charts for the prevalence of Bankcards and Online access in this segment. The result is a view that is similar to the cluster visualizer of InfoSphere Warehouse, but by creating your own report you are able to select only fields that are important to your users and present them in a way that is most suited to them.

Follow these steps to create the main Customer Segments report page:

  1. Create a new report for the same package. You can do this by selecting File->New in the Report Studio. From the template list, select List.
  2. From the CustomerSegments query subject in the Insertable Objects view, drag the query items ID, SIZE, and Description to the list.

You now have the basic foundation of the report page. In the next steps you include the average age and account balance to the list for each cluster. You add them from the Customers table, but first you have to set the aggregation function of Cognos to average to get the expected result.

  1. From the Customers query subject in the "Insertable Objects" view, drag the query items Age and Average Balance to the list.
  2. Because you want the average age and balance, change the aggregate function of both columns to average. Select each row (not the header but the body) in the Properties view and change Data Item->Aggregate Function to Average.

In the next steps you add drill-through functionality to the CustomerDrilldown and AgeDistribution report. If an ID column of one cluster is selected, a new CustomerDrilldown page with the customer data for that cluster is opened. If an Age column of one cluster is selected, an AgeDistribution page for that cluster is opened.

  1. Select the ID column (not the header but the column below it) and choose Drill-Through Definitions from the right-click context menu.
  2. Add a new Drill-Through Definition.
  3. In the Target Report tab of the properties, select the CustomerDrilldown report as the target report.
  4. Select Run the report as the action.
  5. Select the Open in new window check box.
  6. Use the Edit button below the parameters list to add a new linked parameter.
  7. In the Parameters dialog, select Pass data item value as the method to link the parameter to a value of the row in the list.
  8. Choose the ID query item as data item source and click OK.
  9. Repeat these steps for the Age column using the AgeDistribution report as the target report.

In the final set of steps you add two columns that contain pie charts with information about how many customers in a cluster have a bankcard and online access. This information is contained in the CustomerSegmentStatistics query subject. Setting up the pie chart is similar to what you did for the AgeDistribution page. One difference, however, is that Bankcard and online access are categorical fields; they use categories instead of numeric ranges and bins. Therefore, the pie charts have to be read from the CATVALUE query item instead of the NUMVALUE query item.

It is possible to add charts to columns. To show the correct data for each row it is necessary to create a Master-Detail Relationship. Chart and table will have different underlying queries. The Master-Detail relationship tells the report how these queries are connected. In the example, the pie charts show only information from the cluster id that is situated in this row.

  1. Change the Insertable Objects view to the Toolbox. The view has three tabs at the bottom, the outer right one is the Toolbox, which contains charts, lists etc..
  2. Drag a Chart object to the List as if you were going to add a new source column to it. Select the first pie chart type.
  3. Switch the Insertable Objects view to Source.
  4. From the CustomerSegmentsStatistics query subject, drag CATVALUE to the Pie Slices: field of the pie chart.
  5. From the CustomerSegmentsStatistics query subject, drag FREQUENCY to the Default Measure: field of the pie chart.
  6. To filter the data for cluster id and fieldname, add the corresponding query items to the query of the chart. To open the query, right-click on the chart and select Go to Query.
  7. Add the ID and FIELDNAME query items from the CustomerSegmentStatistics query subject in the Insertable Objects view to the Data Items of the query.
  8. Change the aggregation function of the ID and SIZE column to None.
  9. Return to the report page by selecting View->Report Pages and select Page1. Add a new Details Filter to the pie chart with the Expression Definition shown in the following code:
    [FIELDNAME]='BANKCARD' AND ([CATVALUE]='YES' OR [CATVALUE]='NO')
  10. Add a Master Details Relationship between the pie chart and the list.
  11. Select the pie charts column (the body not the header) and select Data->Master Detail Relationships from the file menu.
  12. In the relationship wizard, click on the New Link button. Create a new Connection and connect the ID column of the List with the ID column of the pie chart.
  13. Change the size of the pie chart. In the Properties view of the chart, select the ... button in the Positioning->Size and Overflow menu and change the width to 200 pixels and the height to 100 pixels.
  14. Change the Default Legend title of the pie chart to Bankcard.
  15. Repeat the previous steps for the Online access pie chart. The only difference is that you set the filter query that filters FIELDNAME to ONLINE_ACCESS as shown below:
    [FIELDNAME]='ONLINE_ACCESS' AND ([CATVALUE]='YES' OR [CATVALUE]='NO')
  16. Change the report header text and column headers to make them more meaningful and save your report as CustomerSegments.
Figure 10. The Cognos Report Studio with the main Customer Segments report
The Cognos Report Studio with the main Customer Segments report.

(See a larger version of Figure 10.)

You can now run the finished report. It defaults to the main report page, which shows an overview over the customer segments. Users can select the ID of a segment to see the customer data sets belonging to this segment in a new report. To get a more detailed view on the age distribution, users can click on the age of a segment.

Figure 11. The customer segments overview
The customer segments overview

(See a larger version of Figure 11.)

Figure 12. The age distribution report
The age distribution report
Figure 13. The customer drill-through
The customer segments overview

Conclusion

Customer segmentations allows you to identify and characterize typical groups of customers. This task is helpful to develop products and services that suit the needs of specific customers. Customer segmentation can be achieved using the data clustering capabilities of InfoSphere Warehouse. A crucial step in this process is to explore clusters and be able to understand them. Three approaches to performing this crucial step are visualization of cluster statistics, drill-through to individual data records, and domain specific cluster relevance metrics. While InfoSphere Warehouse contains powerful visualizations, Cognos allows you to easily implement drill-through and user defined metrics, as well as more customized visualizations. This article discussed the conceptual ideas behind this type of integration and gave a step-by-step example using bank customer records.

This fourth article concludes the series about the integration of InfoSphere Mining and Cognos reporting. The goal of the series was to teach you some basic patterns of how to integrate these two products to visualize mining models in Cognos, to invoke mining dynamically from Cognos, to enable drill-through to individual records, and perform several other tasks. Delivering mining results in the right form to the right people is key to making data mining successful in a company and even outside statistics departments. An integration of InfoSphere Warehouse, as a powerful back end, and Cognos as a flexible front-end, is an important step in this direction.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=367477
ArticleTitle=Integrate InfoSphere Warehouse Data Mining with IBM Cognos Reporting, Part 4: Customer segmentation with InfoSphere Warehouse and Cognos
publish-date=01292009