Properties view of the Discretize operator

Use this view to define the properties for the Discretize operator.

A Discretize operator is a graphical icon representing a mining task that you place on the mining editor canvas that converts the values of a continuous input field into a defined set of buckets.

In the Properties view, set the properties for this operator by completing the fields in the following tabs.
  • General
  • Discretization Settings
  • Intervals
  • Carried over columns

General tab

Label
You can rename the operator by specifying a new name. This new name appears on the operator icon in the mining editor canvas.
Description
You can enter any description for the operator.

Discretization Settings tab

Column to discretize
Select the column to discretize from the continuous fields in the input table.
Name of the discretized column
The output port get's an additional column, the discretized column. Specify a name for the new column.
Type of the discretized column
Specify the SQL type for the new column that contains the discretized values.
Add two columns containing the interval boundaries to the output
Check this option to show not only the discretized value but also the interval boundaries in the output table. The output port get's two additional columns:
LOWER_INTERVAL_BOUNDARY and UPPER_INTERVAL_BOUNDARY.
Add a column containing the comparison operators to the output port
Check this option to also show the comparison operator (< or <=) for the interval in the output table. The output port get's an additional column: FLAG.

Intervals tab

To discretize a range of values you must define intervals that divide the range of intervals into smaller parts. You can define these intervals automatically or manually by specifying upper and lower boundaries for each value.

The next figure shows the defined intervals and the following toolbar icons:
  • Auto-generate intervals
  • Create a new interval
  • Delete selected interval
  • Remove all intervals
Figure 1. An example discretize settings table.An example discretize settings table.
Auto-generate intervals
Example: To automatically define 5 intervals in the range 0 - 100.000 for the "Income" column with a VARCHAR type of the discretized column, complete the following steps:
  1. Click the Auto-generate intervals icon.
  2. In the Generate Intervals wizard, select User-defined.
  3. Click Next.
  4. In the User-defined intervals page specify:
    • Lower limit: 0
    • Upper limit: 100000
    • Number of intervals: 5
    • Comparison operator: <
    • Prefix for the discretized value: INCOME_BUCKET
  5. Click Finish.
Example: To automatically create 5 intervals based on the actual values for the "Income" column with a VARCHAR type of the discretized column, complete the following steps:
  1. Click the Auto-generate intervals icon.
  2. In the Generate Intervals wizard, select Statistically determined.
  3. Click Next.
  4. Select a reference column (the column that will be analyzed to determine the interval boundaries).
  5. Click Next.
  6. Specify the following settings:
    • Desired number of intervals: 5
    • Comparison operator: <
    • Prefix for the discretized value: INCOME_BUCKET
  7. Click Finish.
Create a new interval
Example: To manually define the following intervals (as shown in the figure above) for the "Income" column with a VARCHAR type of the discretized column, complete the following steps:
  1. Click the Create new interval icon.
  2. In the Add an Interval window, specify the following settings:
    • Interval boundary: 10000
    • Comparison operator: <
    • Discretized value for interval: Low
    • Discretized value for uppermost: High
    Click OK.
  3. Click the Create new interval icon again.
  4. In the Add an Interval window, specify the following settings:
    • Interval boundary: 20000
    • Comparison operator: <
    • Discretized value for interval: Medium
    Click OK.

Carried over columns tab

You can select columns from the input table that should appear in the output table in addition to the discretized column. By default all columns are carried over from the input port to the output port.

You may want to remove the column to discretize from the output port.



Feedback | Information roadmap