With this wizard, you can define intervals for a discretization.
All feature values or slice values in an interval are mapped to the
same discrete target value.
To be able to read the data from the database, make sure that your
input model is connected to the correct database.
The Discretization wizard guides you through the following pages:
- Select Method
- If you know the value range of your data and the desired number
of intervals, select User-defined. Otherwise,
select Statistically determined to let the
system browse the data and use a suitable data range.
Both methods
generate equidistant interval boundaries. If you need intervals of
varying sizes, you must manually add the interval boundaries.
- User-defined Intervals
- You must specify a lower limit and an upper limit for the value
range and the number of intervals into which that range is divided.
-
- Statistically determined intervals
- You must specify the number of intervals into which that value
range is to be divided. To find efficient boundaries, the wizard might
slightly deviate from this number.
The lower and upper limits
of the value range are statistically determined and depend on the
distribution of the data in the database.
Both methods create two additional intervals, one interval from −∞
to the lower limit and the other interval from the upper limit to +∞.
Optionally, you can change the comparison operator for all intervals.
To include the boundary in the upper interval, use <. To
include it in the lower interval, use <=.
For example, if you set the lower limit to 0, the upper limit to
60, and the number of intervals to 3, the following intervals are
created:
(−∞,0.0)
[0.0,20.0)
[20.0,40)
[40.0,60.0)
[60.0,+∞)