Discretization wizard

With this wizard, you can define intervals for a discretization. All feature values or slice values in an interval are mapped to the same discrete target value.

To be able to read the data from the database, make sure that your input model is connected to the correct database.

The Discretization wizard guides you through the following pages:

Select Method
If you know the value range of your data and the desired number of intervals, select User-defined. Otherwise, select Statistically determined to let the system browse the data and use a suitable data range.

Both methods generate equidistant interval boundaries. If you need intervals of varying sizes, you must manually add the interval boundaries.

User-defined Intervals
You must specify a lower limit and an upper limit for the value range and the number of intervals into which that range is divided.
Statistically determined intervals
You must specify the number of intervals into which that value range is to be divided. To find efficient boundaries, the wizard might slightly deviate from this number.

The lower and upper limits of the value range are statistically determined and depend on the distribution of the data in the database.

Both methods create two additional intervals, one interval from −∞ to the lower limit and the other interval from the upper limit to +∞.

Optionally, you can change the comparison operator for all intervals. To include the boundary in the upper interval, use <. To include it in the lower interval, use <=.

For example, if you set the lower limit to 0, the upper limit to 60, and the number of intervals to 3, the following intervals are created:
(−∞,0.0)
[0.0,20.0)
[20.0,40)
[40.0,60.0)
[60.0,+∞)


Feedback | Information roadmap