Data mining algorithms might require categorical input
instead of numeric input. If this is required, the data must be preprocessed
so that values in certain numeric ranges are mapped to discrete values.
You can apply the same technique when small differences in numeric
values are irrelevant for a problem. For example, you might want to
use this technique to replace the exact age of a person with an age
group.
You can define a discretization on the following pages of the Discretization
properties view:
General
- Name
- The name for a discretization must be unique.
- Description
- Optionally, you can add more details about the discretization.
Feature Selection
You must define the feature
or the slice that you want to discretize. The feature or the slice
must be numeric.
Range Definitions
You can define the intervals
for which you want to provide discrete values manually, or you can
use the Discretization wizard:
- Manual interval definition
- Click Add Interval to add new interval
boundaries and discrete target values for the resulting intervals.
- Using the Discretization wizard
- Open the wizard by clicking Automatically create intervals.
The wizard fetches the values that are contained in the database and
generates intervals depending on the distribution of values in the
database. The wizard also lets you specify the number of intervals.
Result Column
You
must define a name for the result column in the output table. The
name must be a valid Db2® column
name. By
default, the result column name is derived from the name of the discretization.
If
you rename a discretization, the result column name is also changed.
If you change the result column name in the Discretization properties,
the name of the discretization is not changed.
You can
override the default data type of the result columns. The drop-down
list shows the available data types.