# Optimal Binning Options

Preprocessing. "Pre-binning" binning input variables with many distinct values can improve processing time without a great sacrifice in the quality of the final bins. The maximum number of bins gives an upper bound on the number of bins created. Thus, if you specify 1000 as the maximum but a binning input variable has less than 1000 distinct values, the number of preprocessed bins created for the binning input variable will equal the number of distinct values in the binning input variable.

Sparsely Populated Bins. Occasionally, the procedure may produce bins with very few cases. The following strategy deletes these pseudo cutpoints:

For a given variable, suppose that the algorithm found n final cutpoints and thus n final+1 bins. For bins i = 2, ..., n final (the second lowest-valued bin through the second highest-valued bin), compute

where sizeof(b) is the number of cases in the bin.

When this value is less than the specified merging threshold, b i is considered sparsely populated and is merged with b i-1 or b i+1, whichever has the lower class information entropy.

The procedure makes a single pass through the bins.

Bin Endpoints. This option specifies how the lower limit of an interval is defined. Since the procedure automatically determines the values of the cutpoints, this is largely a matter of preference.

First (Lowest) / Last (Highest) Bin. These options specify how the minimum and maximum cutpoints for each binning input variable are defined. Generally, the procedure assumes that the binning input variables can take any value on the real number line, but if you have some theoretical or practical reason for limiting the range, you can bound it by the lowest / highest values.

How to specify options for optimal binning