Discretization properties view

Data mining algorithms might require categorical input instead of numeric input. If this is required, the data must be preprocessed so that values in certain numeric ranges are mapped to discrete values.

You can apply the same technique when small differences in numeric values are irrelevant for a problem. For example, you might want to use this technique to replace the exact age of a person with an age group.

You can define a discretization on the following pages of the Discretization properties view:

General

Name
The name for a discretization must be unique.
Description
Optionally, you can add more details about the discretization.

Feature Selection

You must define the feature or the slice that you want to discretize. The feature or the slice must be numeric.

Range Definitions

You can define the intervals for which you want to provide discrete values manually, or you can use the Discretization wizard:
Manual interval definition
Click Add Interval to add new interval boundaries and discrete target values for the resulting intervals.
Using the Discretization wizard
Open the wizard by clicking Automatically create intervals. The wizard fetches the values that are contained in the database and generates intervals depending on the distribution of values in the database. The wizard also lets you specify the number of intervals.

Result Column

You must define a name for the result column in the output table. The name must be a valid Db2® column name. By default, the result column name is derived from the name of the discretization.

If you rename a discretization, the result column name is also changed. If you change the result column name in the Discretization properties, the name of the discretization is not changed.

You can override the default data type of the result columns. The drop-down list shows the available data types.



Feedback