# Optimal Binning Options

**Preprocessing.** "Pre-binning"
binning input variables with many distinct values can improve processing
time without a great sacrifice in the quality of the final bins. The
maximum number of bins gives an upper bound on the number of bins
created. Thus, if you specify 1000 as the maximum but a binning input
variable has less than 1000 distinct values, the number of preprocessed
bins created for the binning input variable will equal the number
of distinct values in the binning input variable.

**Sparsely Populated Bins.** Occasionally, the procedure may produce bins with very few cases.
The following strategy deletes these pseudo cutpoints:

For a given variable, suppose that the algorithm
found *n*
_{final} cutpoints and thus *n*
_{final}+1 bins. For bins *i* = 2, ..., *n*
_{final} (the second lowest-valued
bin through the second highest-valued bin), compute

where *sizeof(b)* is the number of cases in the bin.

When this value is less than the specified merging
threshold, *b*
_{i} is considered sparsely populated and
is merged with *b*
_{i-1} or *b*
_{i+1}, whichever has the lower
class information entropy.

The procedure makes a single pass through the bins.

**Bin Endpoints.** This option
specifies how the lower limit of an interval is defined. Since the
procedure automatically determines the values of the cutpoints, this
is largely a matter of preference.

**First (Lowest) / Last (Highest) Bin.** These options specify how the minimum and maximum cutpoints for
each binning input variable are defined. Generally, the procedure
assumes that the binning input variables can take any value on the
real number line, but if you have some theoretical or practical reason
for limiting the range, you can bound it by the lowest / highest values.

How to specify options for optimal binning

- From the menus choose:
- In the Optimal Binning dialog box, click Options.