You must configure operators to create a virtual table
from the GOSALESCT.CUST_ORDER_DETAIL and GOSALES.PRODUCT tables.
The virtual table will be structured to provide your transaction
data to the Associations operator.
Procedure
To
add data source operators and configure the preprocessing steps for
your source data:
- Configure an operator to use the GOSALESCT.CUST_ORDER_DETAIL
table as a data source for your flow:
- Drag a Table Source operator
from the Sources and Targets section of the
palette to your flow.
- Click the ellipsis (
)
button.
- Select the CUST_ORDER_DETAIL table
of the GOSALESCT schema from the Select
Database Table window and click Finish.
The GOSALESCT.CUST_ORDER_DETAIL table has a row for each
item purchase that was recorded in your database. The values in the
PRODUCT_NUMBER column identify the item that was purchased, and the
values in the CUST_ORDER_NUMBER column identify the transaction during
which the item was purchased. A single transaction can include purchases
of several items. Your mining flow will analyze your transaction
data to discover items that are frequently purchased in the same
transaction.
- Configure an operator to use the GOSALES.PRODUCT table
as a data source for your flow:
- Drag a Table Source operator
from the Sources and Targets section of the
palette to your flow.
- Click the ellipsis (
)
button.
- Select the PRODUCT table of
the GOSALES schema from the Select
Database Table window and click Finish.
The GOSALES.PRODUCT table contains detailed information about
every product in your database, such as the date the product was
introduced, and whether the product is discontinued. You will be
interested in two fields: PRODUCT_NUMBER and BASE_PRODUCT_NUMBER.
In the GSDB database, every variation of a product has a unique product
number, but shares a base product number. For example, a pair of
blue Hawk Eye sunglasses in size small has the same base product
number as a pair of green Hawk Eye sunglasses in size large, but
the variants have different product numbers.
- Configure an operator that joins columns of the GOSALESCT.CUST_ORDER_DETAIL
and GOSALES.PRODUCT tables into a virtual table that contains only
transaction data:
- Drag a Table Join operator from
the Transformations section of the palette to
your flow.
- Join the columns:
- Click the Output port of the CUST_ORDER_DETAIL Table
Source operator and then click the in port
of your new Table Join operator to specify the GOSALESCT.CUST_ORDER_DETAIL
table as an input.
- Click the Output port of the PRODUCT Table
Source operator and then click the in1 port
of your Table Join operator to specify the
GOSALES.PRODUCT table as an input.
- Double-click your Table Join operator and complete the
wizard by specifying the settings that are shown in the following
table.
| Page |
Steps |
| General |
In the Label field, type
the following name:Transaction Data
|
| Condition |
Provide an SQL expression that specifies which
rows of your tables to join together:- Click the ellipsis (
) button.
- Double-click the PRODUCT_NUMBER field of
the top table.
- Double-click the equality (=) operation.
- Double-click the PRODUCT_NUMBER field of
the second table.
- Click OK.
Your join condition will look like the following condition: "IN_nn_0"."PRODUCT_NUMBER" = "IN1_nn_1"."PRODUCT_NUMBER"
nn is
an arbitrary two-digit sequence. This SQL expression specifies that
the Table Join operator is to combine rows of the GOSALESCT.CUST_ORDER_DETAIL
and GOSALES.PRODUCT tables that contain the same product number. |
| Select List |
Specify the columns to include in the virtual
table that results from the operation:- Click Delete All (
)
in the Result Columns section.
- Double-click the CUST_ORDER_NUMBER field
of the first table.
- Double-click the BASE_PRODUCT_NUMBER field
of the second table.
|
After you complete the wizard, the Table Join operator
is configured to accomplish two tasks:
- Replace the PRODUCT_NUMBER field of each transaction with the
BASE_PRODUCT_NUMBER field.
By reducing the granularity of each transaction record, you
allow the Associations operator to discover association rules that
are more general.
For example, suppose you try to find association
rules for the original transaction data that has a PRODUCT_NUMBER
field. Your results might indicate that customers who purchased blue
Hawk Eye sunglasses in size small also purchased TrailChef water
bags, customers who purchased orange Hawk Eye sunglasses in size
medium also purchased TrailChef water bags, and so on, but your results
would not include the more useful rule that customers who purchased
Hawk Eye sunglasses of any size or color also purchased TrailChef
water bags. Worse still, your results might not describe any correlation
between purchases of Hawk Eye sunglasses and TrailChef water bags,
because the frequency of Hawk Eye sunglasses in their different variants
is too low.
- Remove irrelevant fields from the transaction data.
- Only the BASE_PRODUCT_NUMBER and CUST_ORDER_NUMBER fields are
necessary to identify the products that each order includes.
- Optional: Run your partial flow to see your
intermediate virtual table:
- Right-click the Transaction Data Table
Join operator and click Run to this step.
- Click Finish in the Run
to this step window to accept the default options.
After the flow runs, you can see the virtual table that
the flow creates in the Results tab of the Execution
Status view. You will use this virtual table as input
to the Associations operator.
- Specify the result of the table join as input to the Associations
operator:
- Click the Inner port of the Table Join
operator and then click the Input port of
the Associations operator.
- Specify CUST_ORDER_NUMBER as the group column of your virtual
source table:
- Right-click the Associations operator and click Show
Properties View.
- On the Mining Settings page of
the Properties view, set the Group
column field to CUST_ORDER_NUMBER. The group column field specifies that rows of the table that
have the same value in the CUST_ORDER_NUMBER field are members of
the same group. In this scenario, a group corresponds to a customer
order.
You now have steps in your mining flow that prepare your transaction
data for analysis. Next you will add a step that visualizes the mining
model that results from the analysis.