< Previous | Next >

Lesson 2: Creating a simple mining flow

In this lesson, you will create a simple mining flow and use the associations visualizer to view the results.

Overview

You are creating a mining model that contains information about combinations of products that your customers are likely to buy together. This analysis is performed by the Associations operator, which analyzes your transaction data and formulates association rules that describe the patterns that the operator discovers.

The Associations operator requires at least one input operator and one output operator. Therefore, a simple mining flow consists of an Associations operator that is directly connected to a Table Source operator and a visualizer operator. The mining flow that you create in this lesson also includes operators that prepare your source tables for the Associations operator, but the flow is simpler than a typical mining flow.

Tasks in this lesson

Adding an Associations operator to your mining flow

You must place an Associations operator onto your new mining flow to configure the step of your flow during which data mining procedures analyze your data.

Procedure

To add an Associations operator:

  1. Add a new Associations operator:
    • From the palette on the upper right side of the canvas, find the Associations operator in the Data Mining section, and drag the Associations operator to the canvas.
  2. Configure your new Associations operator:
    1. Right-click the Associations operator in your mining flow and click Show properties view. The Properties view opens below the canvas.
    2. On the General page of the Properties view, type Product Affinities in the Label field.
    3. On the Model Name page of the Properties view, type PROD_AFFINITIES in the Model Name field.
    4. On the Mining Settings page of the Properties view, specify the settings that are shown in the following table.
      Field Value Explanation
      Maximum rule length 3 If you increase the maximum rule length to 3, your model can include rules that involve three items, such as "Customers who purchased tents and sleeping bags also purchased backpacks."
      Minimum support (%) 0.1 Because your transaction data contains many records, a small percentage of all records will support any particular rule. You must reduce the minimum support of a rule to avoid excluding relevant rules from your model.
You placed on your mining flow the operator that analyzes your data. Next, you will add operators that provide data to the Associations operator.

Adding data source operators to your mining flow

You must configure operators to create a virtual table from the GOSALESCT.CUST_ORDER_DETAIL and GOSALES.PRODUCT tables. The virtual table will be structured to provide your transaction data to the Associations operator.

Procedure

To add data source operators and configure the preprocessing steps for your source data:

  1. Configure an operator to use the GOSALESCT.CUST_ORDER_DETAIL table as a data source for your flow:
    1. Drag a Table Source operator from the Sources and Targets section of the palette to your flow.
    2. Click the ellipsis (...) button.
    3. Select the CUST_ORDER_DETAIL table of the GOSALESCT schema from the Select Database Table window and click Finish.
    The GOSALESCT.CUST_ORDER_DETAIL table has a row for each item purchase that was recorded in your database. The values in the PRODUCT_NUMBER column identify the item that was purchased, and the values in the CUST_ORDER_NUMBER column identify the transaction during which the item was purchased. A single transaction can include purchases of several items. Your mining flow will analyze your transaction data to discover items that are frequently purchased in the same transaction.
  2. Configure an operator to use the GOSALES.PRODUCT table as a data source for your flow:
    1. Drag a Table Source operator from the Sources and Targets section of the palette to your flow.
    2. Click the ellipsis (...) button.
    3. Select the PRODUCT table of the GOSALES schema from the Select Database Table window and click Finish.
    The GOSALES.PRODUCT table contains detailed information about every product in your database, such as the date the product was introduced, and whether the product is discontinued. You will be interested in two fields: PRODUCT_NUMBER and BASE_PRODUCT_NUMBER. In the GSDB database, every variation of a product has a unique product number, but shares a base product number. For example, a pair of blue Hawk Eye sunglasses in size small has the same base product number as a pair of green Hawk Eye sunglasses in size large, but the variants have different product numbers.
  3. Configure an operator that joins columns of the GOSALESCT.CUST_ORDER_DETAIL and GOSALES.PRODUCT tables into a virtual table that contains only transaction data:
    1. Drag a Table Join operator from the Transformations section of the palette to your flow.
    2. Join the columns:
      1. Click the Output port of the CUST_ORDER_DETAIL Table Source operator and then click the in port of your new Table Join operator to specify the GOSALESCT.CUST_ORDER_DETAIL table as an input.
      2. Click the Output port of the PRODUCT Table Source operator and then click the in1 port of your Table Join operator to specify the GOSALES.PRODUCT table as an input.
    3. Double-click your Table Join operator and complete the wizard by specifying the settings that are shown in the following table.
      Page Steps
      General In the Label field, type the following name:
      Transaction Data
      Condition Provide an SQL expression that specifies which rows of your tables to join together:
      1. Click the ellipsis (...) button.
      2. Double-click the PRODUCT_NUMBER field of the top table.
      3. Double-click the equality (=) operation.
      4. Double-click the PRODUCT_NUMBER field of the second table.
      5. Click OK.
      Your join condition will look like the following condition:
      "IN_nn_0"."PRODUCT_NUMBER" = "IN1_nn_1"."PRODUCT_NUMBER"
      nn is an arbitrary two-digit sequence. This SQL expression specifies that the Table Join operator is to combine rows of the GOSALESCT.CUST_ORDER_DETAIL and GOSALES.PRODUCT tables that contain the same product number.
      Select List Specify the columns to include in the virtual table that results from the operation:
      1. Click Delete All () in the Result Columns section.
      2. Double-click the CUST_ORDER_NUMBER field of the first table.
      3. Double-click the BASE_PRODUCT_NUMBER field of the second table.
      After you complete the wizard, the Table Join operator is configured to accomplish two tasks:
      Replace the PRODUCT_NUMBER field of each transaction with the BASE_PRODUCT_NUMBER field.

      By reducing the granularity of each transaction record, you allow the Associations operator to discover association rules that are more general.

      For example, suppose you try to find association rules for the original transaction data that has a PRODUCT_NUMBER field. Your results might indicate that customers who purchased blue Hawk Eye sunglasses in size small also purchased TrailChef water bags, customers who purchased orange Hawk Eye sunglasses in size medium also purchased TrailChef water bags, and so on, but your results would not include the more useful rule that customers who purchased Hawk Eye sunglasses of any size or color also purchased TrailChef water bags. Worse still, your results might not describe any correlation between purchases of Hawk Eye sunglasses and TrailChef water bags, because the frequency of Hawk Eye sunglasses in their different variants is too low.

      Remove irrelevant fields from the transaction data.
      Only the BASE_PRODUCT_NUMBER and CUST_ORDER_NUMBER fields are necessary to identify the products that each order includes.
  4. Optional: Run your partial flow to see your intermediate virtual table:
    1. Right-click the Transaction Data Table Join operator and click Run to this step.
    2. Click Finish in the Run to this step window to accept the default options.
    After the flow runs, you can see the virtual table that the flow creates in the Results tab of the Execution Status view. You will use this virtual table as input to the Associations operator.
  5. Specify the result of the table join as input to the Associations operator:
    • Click the Inner port of the Table Join operator and then click the Input port of the Associations operator.
  6. Specify CUST_ORDER_NUMBER as the group column of your virtual source table:
    1. Right-click the Associations operator and click Show Properties View.
    2. On the Mining Settings page of the Properties view, set the Group column field to CUST_ORDER_NUMBER. The group column field specifies that rows of the table that have the same value in the CUST_ORDER_NUMBER field are members of the same group. In this scenario, a group corresponds to a customer order.
You now have steps in your mining flow that prepare your transaction data for analysis. Next you will add a step that visualizes the mining model that results from the analysis.

Adding a Visualizer operator to your mining flow

You must specify the Visualizer operator as the last step of your mining flow to look at the mining model that the flow creates.

Procedure

To add a Visualizer operator:

  1. Add a new Visualizer operator to your flow:
    • Drag a Visualizer operator from the Sources and Target section of the palette to your flow.
  2. Connect the Associations operator to the Visualizer operator:
    • Click the Model port of the Associations operator and then click the Model port of your new Visualizer operator.
Now, when you run your flow, you can visually explore the results.

Saving and running your mining flow

You must run your mining flow to generate an association model of your data and open the associations visualizer.

Procedure

To save and run your mining flow:

  1. In the menu bar, click File > Save.
  2. In the menu bar, click DB2 for LUW Mining Flow > Execute.
  3. Click Execute to accept the default selections in the Flow Execution window.
After your flow finishes running, the associations visualizer will start.

Viewing your simple association model

You can view your simple association model to understand the ways in which the model can improve.

Procedure

To view your association model and see how the model can improve:

  1. In the Associations Visualizer window, look at the Rules view. The association rules in the Rules column look like [17] ==> [11], which means that customers who purchased product number 17 also purchased product number 11. Your mining model can be improved by substituting these product numbers with the names of the products, so that the associations visualizer instead displays rules like [Hibernator Lite] ==> [Star Lite].
  2. Close the Associations Visualizer window.

What's next?

In the next lesson, you will improve your mining model by adding steps to your mining flow that use name lookup tables to replace product codes with product names.
< Previous | Next >



Feedback | Information roadmap