Optimal Binning

The Optimal Binning procedure discretizes one or more scale variables (referred to henceforth as binning input variables) by distributing the values of each variable into bins. Bin formation is optimal with respect to a categorical guide variable that "supervises" the binning process. Bins can then be used instead of the original data values for further analysis.

Examples. Reducing the number of distinct values a variable takes has a number of uses, including:

  • Data requirements of other procedures. Discretized variables can be treated as categorical for use in procedures that require categorical variables. For example, the Crosstabs procedure requires that all variables be categorical.
  • Data privacy. Reporting binned values instead of actual values can help safeguard the privacy of your data sources. The Optimal Binning procedure can guide the choice of bins.
  • Speed performance. Some procedures are more efficient when working with a reduced number of distinct values. For example, the speed of Multinomial Logistic Regression can be improved using discretized variables.
  • Uncovering complete or quasi-complete separation of data.

Optimal versus Visual Binning. The Visual Binning dialog boxes offer several automatic methods for creating bins without the use of a guide variable. These "unsupervised" rules are useful for producing descriptive statistics, such as frequency tables, but Optimal Binning is superior when your end goal is to produce a predictive model.

Output. The procedure produces tables of cutpoints for the bins and descriptive statistics for each binning input variable. Additionally, you can save new variables to the active dataset containing the binned values of the binning input variables and save the binning rules as command syntax for use in discretizing new data.

Optimal Binning Data Considerations

Data. This procedure expects the binning input variables to be scale, numeric variables. The guide variable should be categorical and can be string or numeric.

To Obtain Optimal Binning

This feature requires the Data Preparation option.

  1. From the menus choose:

    Transform > Optimal Binning...

  2. Select one or more binning input variables.
  3. Select a guide variable.

Variables containing the binned data values are not generated by default. Use the Save tab to save these variables.

This procedure pastes OPTIMAL BINNING command syntax.