Defining a Find Deviations operator

You can find records in your input table that deviate from the majority of records.

Before you begin

  • Place the Find Deviations operator in the canvas.
  • Connect its input port to the database source table in which to find deviations.

Procedure

To define a Find Deviations operator, complete the following steps:

  1. In the canvas, click the operator to select it. A black box highlights a selected operator. After you select an operator, the property pages for that operator appear in the Properties view beneath the canvas.
  2. Use the tabs on the left side of the Properties view to navigate each operator's property pages.
  3. Optional: Specify the operator's general properties.
    1. In the Properties view, click the General tab.
    2. You can modify the following fields:
      • To rename the operator, enter a name in the Label field.
      • You can add a description for the operator in the Description field.
  4. Optional: Specify the model name information.
    1. In the Properties view, click the Model Name tab.
    2. Enter the model name. The model name is used as a key to lookup the model in Db2® tables and is also displayed when browsing the model with IM Visualization. If a model with the same name already exists in the Db2 database, the existing model is replaced.
  5. Optional: Specify the mining settings.
    1. In the Properties view, click the Mining Settings tab.
    2. You can modify default parameters of the Easy Mining procedures. This is for advanced users only. For supported optional parameters, refer to the Intelligent Miner® Easy Mining Procedures documentation.
  6. Optional: Specify the available and output columns settings.
    1. In the Properties view, click the Output Columns tab.
    2. In the Available columns table, select the columns that will appear in the output port, and use the right-arrow button to move the selected columns to the Output columns table.
    3. To remove columns from Output columns table, select the columns and use the left-arrow button to move them back to the Available columns table.

Example

Note: To create the tables used in this example, run the Intelligent Miner Easy Mining samples Db2 script samples\EasyMining\SampleTables.db2.
Scenario:
  • You have a BANK.CUSTOMERS_MASTERDATA table that contains information about a bank's customers.
  • You want to find untypical records like a customer with AGE=15 and PROFESSION=pensioner to detect data quality problems.
Figure 1. Sample customer dataSample customer data
  1. Place a Table Source operator for the BANK.CUSTOMER_MASTERDATA table on the canvas.
  2. Place a Find Deviations operator on the canvas.
  3. Connect the output port of table CUSTOMER_MASTERDATA to the Find Deviations operator input port.
  4. Select the Find Deviations operator and select the General tab of the properties view. In the Label field, enter FindDeviatingCustomers.
  5. On the Model Name tab, enter CustomerOutliers in the Model Name field.
  6. Optional: On the Output Columns tab, remove all fields from the Output Columns table except CLIENT_ID. You need at least the CLIENT_ID in the output table to identify each record.
  7. Optional: On the Mining Settings tab, set the Maximum cluster size to 30% of the records.
  8. Connect the output port that contains the input rows:
    1. Right-click the Find Deviations operator output port and select Create suitable table.
    2. In the wizard, enter the table name DEVIATIONS and make CLIENT_ID a primary key. Click Finish.
    3. Place a Table Target operator for the new DEVIATIONS table on the canvas.
    4. Connect the output port of the Find Deviations operator with the input port of the Table Target operator.
  9. Execute your mining flow.

The resulting flow looks like this:

Figure 2. The resulting Find Deviations operator in the mining flowThe resulting Find Deviations operator in the mining flow

Explanation:

The receiving table DEVIATIONS receives all records from the input table and has two additional fields DEV_DEGREE and CLUSTER_ID.

The higher the DEV_DEGREE value, the more untypical the record is.

To understand, why the record is considered untypical, open the Visualizer for the clustering model IDMMX.CustomerOutliers from the Data Mining Models folder in the Data Source Explorer view. Here you can see the cluster characteristics for each cluster ID assigned by the Find Deviations operator. Records assigned to the smallest clusters are considered untypical.



Feedback