You
can find records in your input table that deviate from
the majority of records.
Before you begin
- Place the Find Deviations operator in the canvas.
- Connect
its input port to the database source table in which to
find deviations.
Procedure
To define a Find Deviations operator, complete
the following steps:
- In the canvas, click the operator to select it. A black box highlights a selected operator. After you select an
operator, the property pages for that operator appear in the Properties view
beneath the canvas.
- Use the tabs on the left side of the Properties view
to navigate each operator's property pages.
- Optional: Specify the operator's general
properties.
- In the Properties view, click
the General tab.
- You
can modify the following fields:
- To rename the
operator, enter a name in the Label field.
- You can add a description for the operator in the Description field.
- Optional:
Specify the model name information.
- In
the Properties view, click the Model Name tab.
- Enter the model name. The model
name is used
as a key to lookup the model in Db2® tables
and is also displayed when browsing the model with IM Visualization.
If a model with the same name already exists in the Db2 database, the existing model is replaced.
- Optional: Specify the
mining settings.
- In the Properties view,
click the Mining
Settings tab.
- You can modify
default parameters of the Easy Mining
procedures. This is for advanced users only. For supported optional
parameters, refer to the Intelligent Miner® Easy Mining Procedures documentation.
- Optional: Specify the available
and output columns
settings.
- In the Properties view, click
the Output
Columns tab.
- In the Available
columns table,
select the columns that will appear in the output port, and use the
right-arrow button to move the selected columns to the Output columns
table.
- To remove columns from Output
columns table,
select the columns and use the left-arrow button to move them back
to the Available columns table.
Example
Note: To create the tables used in this example,
run the Intelligent
Miner Easy Mining samples Db2 script samples\EasyMining\SampleTables.db2.
Scenario: - You have a BANK.CUSTOMERS_MASTERDATA
table that contains information
about a bank's customers.
- You want to find untypical records
like a customer with AGE=15
and PROFESSION=pensioner to detect data quality problems.
Figure 1. Sample customer data
- Place a Table Source operator
for the BANK.CUSTOMER_MASTERDATA table
on the canvas.
- Place a Find Deviations operator on the canvas.
- Connect the output port of table CUSTOMER_MASTERDATA to
the Find Deviations operator input port.
- Select the Find Deviations
operator and select the General tab
of the properties view. In the Label field,
enter FindDeviatingCustomers.
- On the Model
Name tab, enter CustomerOutliers in
the Model Name field.
- Optional:
On the Output Columns tab,
remove all fields from the Output Columns table
except CLIENT_ID. You need at least the CLIENT_ID in
the output table to identify each record.
- Optional:
On the Mining Settings tab,
set the Maximum cluster size to 30% of
the records.
- Connect the output port that contains the input
rows:
- Right-click the Find Deviations operator output port
and select Create
suitable table.
- In the wizard, enter the table
name DEVIATIONS and
make CLIENT_ID a primary key. Click Finish.
- Place a Table Target operator for the new DEVIATIONS table on
the canvas.
- Connect the output port of the Find Deviations
operator with the
input port of the Table Target operator.
- Execute
your mining flow.
The resulting flow looks like this:
Figure 2. The resulting Find Deviations operator in
the mining flow
Explanation:
The
receiving table DEVIATIONS receives all records from
the input table and has two additional fields DEV_DEGREE and CLUSTER_ID.
The
higher the DEV_DEGREE value, the more untypical the
record is.
To understand, why the record is considered untypical,
open the Visualizer for the clustering model IDMMX.CustomerOutliers from
the Data Mining Models folder in the Data Source Explorer view. Here
you can see the cluster characteristics for each cluster ID assigned
by the Find Deviations operator. Records assigned to the smallest
clusters are considered untypical.