Discretization

Show Me
Some data mining algorithms require categorical input instead of numeric input. In this case, the data must be preprocessed so that values in certain numeric ranges are mapped to discrete values.

You can apply the same technique when small differences in numeric values are irrelevant for a problem. For example, you might want to use this technique to replace the exact age of a person with an age group.

In the sales environment, you might want to use the sales performance class (SALES_PERF) to classify a sales amount as poor, mediocre, good, or outstanding according to predefined sales limits. A performance might be poor if total sales is less than 10,000, mediocre if it is between 10,000 and 20,000, good if it is between 20,000 and 50,000, and outstanding if it is greater than 50,000.

To specify this transformation, you can use the Discretization feature to discretize the total sales amount by setting the boundary values.



Feedback | Information roadmap