What is Optimal Binning and how is it done?

Problem

SPSS and Clementine have a new procedure called Optimal Binning. Please explain the concept of optimal binning and analysis situations where it is useful?

Resolving The Problem

Optimal Binning was introduced in SPSS 15 and Clementine 11. It is a supervised method for discretizing a scale numeric variable (numeric and treated as continuous), i.e. grouping the values of that variable into a relatively small set of discrete values (bins), each of which represent a range of values on the original variable.The discretization may be peformed to allow analysis that is restricted to categorical variables.

In unsupervised binning, the numeric variable is discretized without regard to any other variable. The goal is generally to provide a set of ordered categories that have specific distributional characteristics (equal intervals on the scale variable, equal sample sizes across bins, approximate normal distribution across bins, etc. SPSS procedures that perform unsupervised binning are the Visual Binning (formerly Visual Bander) and Rank (with Ntile() ranking) procedures. Categories procedures will discretize scale variables.

In Supervised binning, the cut-points are chosen to optimize the relationship of the scale variable with a nominal variable. Cases are sorted internally on the scale variable. Forward binning algorithms begin within all cases in a single bin. Cut-points are inserted into the scale to divide the sample into progressively smaller bins until a stopping criterion is reached. Backward binning algorithms begin with each unique scale value defined as a bin. Bins are merged until a stopping criterion is reached.

MDLP (Minimum Description Length Principle) is a forward binning method and is the only supervised method available in the first release of the Optimal Binning procedure. MDLP chooses cut-points to minimize entropy in the resulting bins. Entropy is a measure of the diversity of the bin on the nominal variable. Entropy is minimized (0) when there is only one nominal variable category represented by the cases in the bin. Entropy for a bin will be larger when there are multiple categories of nearly equal sizes.

Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-value attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence: San Mateo CA:Morgan Kaufmann, 1022-1027.
Dougherty, J. Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proc. Twelfth International Conference on Machine Learning. Los Altos CA: Morgan Kaufmann, 194-202.
Liu, H., Hussain, F., Tan, C.L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6, 393-423.

Optimal Binning is part of the Data Validation module, so users must be licensed for that module.
In SPSS versions 15.0 and above, Optimal Binning is available from the Transform menu. In Clementine versions 11 and above, there is a Binning node in the 'Field Ops' group. One of the binning options in that node is 'Optimal'