Optimal Binning

If the field you want to bin is strongly associated with another categorical field, you can select the categorical field as a "supervisor" field in order to create the bins in such a way as to preserve the strength of the original association between the two fields.

For example, suppose you have used cluster analysis to group states based on delinquency rates for home loans, with the highest rates in the first cluster. In this case, you might choose Percent past due and Percent of foreclosures as the Bin fields and the cluster membership field generated by the model as the supervisor field.

Name extension Specify an extension to use for the generated field(s) and whether to add it at the start (Prefix) or end (Suffix) of the field name. For example, you could generate a new field called pastdue_OPTIMAL and another called inforeclosure_OPTIMAL.

Supervisor field A categorical field used to construct the bins.

Pre-bin fields to improve performance with large datasets Indicates if preprocessing should be used to streamline optimal binning. This groups scale values into a large number of bins using a simple unsupervised binning method, represents values within each bin by the mean, and adjusts the case weight accordingly before proceeding with supervised binning. In practical terms, this method trades a degree of precision for speed and is recommended for large datasets. You can also specify the maximum number of bins that any variable should end up in after preprocessing when this option is used.

Merge bins that have relatively small case counts with a larger neighbor. If enabled, indicates that a bin is merged if the ratio of its size (number of cases) to that of a neighboring bin is smaller than the specified threshold; note that larger thresholds may result in more merging.