Use this view to define the properties for the Discretize operator.
A Discretize operator is a graphical icon representing
a mining task that you place on the mining editor canvas
that converts the values of a continuous input field into a defined set of
buckets.
In the Properties view, set the properties for this operator
by completing the fields in the following tabs.
- General
- Discretization Settings
- Intervals
- Carried over columns
General tab
- Label
- You can rename the operator by specifying a new name. This new name appears
on the operator icon in the mining editor canvas.
- Description
- You can enter any description for the operator.
Discretization Settings tab
- Column to discretize
- Select the column to discretize from the continuous fields in the input
table.
- Name of the discretized column
- The output port get's an additional column, the discretized column. Specify
a name for the new column.
- Type of the discretized column
- Specify the SQL type for the new column that contains the discretized
values.
- Add two columns containing the interval boundaries to the output
- Check this option to show not only the discretized value but also the
interval boundaries in the output table. The output port get's two additional
columns:
- LOWER_INTERVAL_BOUNDARY and UPPER_INTERVAL_BOUNDARY.
- Add a column containing the comparison operators to the output port
- Check this option to also show the comparison operator (< or <=)
for the interval in the output table. The output port get's an additional
column: FLAG.
Intervals tab
To discretize a range of values you
must define intervals that divide the range of intervals into smaller parts.
You can define these intervals automatically or manually by specifying upper
and lower boundaries for each value.
The next figure shows the defined
intervals and the following toolbar icons:
- Auto-generate intervals
- Create a new interval
- Delete selected interval
- Remove all intervals
Figure 1. An
example discretize settings table.
- Auto-generate intervals
- Example: To automatically define 5 intervals in the range 0 - 100.000
for the "Income" column with a VARCHAR type of the discretized
column, complete the following steps:
- Click the Auto-generate intervals icon.
- In the Generate Intervals wizard, select User-defined.
- Click Next.
- In the User-defined intervals page specify:
- Lower limit: 0
- Upper limit: 100000
- Number of intervals: 5
- Comparison operator: <
- Prefix for the discretized value: INCOME_BUCKET
- Click Finish.
- Example: To automatically create 5 intervals based on the actual
values for the "Income" column with a VARCHAR type of the
discretized column, complete the following steps:
- Click the Auto-generate intervals icon.
- In the Generate Intervals wizard, select Statistically determined.
- Click Next.
- Select a reference column (the column that will be analyzed to determine
the interval boundaries).
- Click Next.
- Specify the following settings:
- Desired number of intervals: 5
- Comparison operator: <
- Prefix for the discretized value: INCOME_BUCKET
- Click Finish.
- Create a new interval
- Example: To manually define the following intervals (as shown in
the figure above) for the "Income" column with a VARCHAR type
of the discretized column, complete the following steps:
- Click the Create new interval icon.
- In the Add an Interval window, specify the following settings:
- Interval boundary: 10000
- Comparison operator: <
- Discretized value for interval: Low
- Discretized value for uppermost: High
Click OK.
- Click the Create new interval icon again.
- In the Add an Interval window, specify the following settings:
- Interval boundary: 20000
- Comparison operator: <
- Discretized value for interval: Medium
Click OK.
Carried over columns tab
You can select columns
from the input table that should appear in the output table in addition to
the discretized column. By default all columns are carried over from the input
port to the output port.
You may want to remove the column to discretize
from the output port.