Overview (OPTIMAL BINNING command)

The OPTIMAL BINNING procedure discretizes one or more scale variables (referred to henceforth as binning input variables) by distributing the values of each variable into bins. Bins can then be used instead of the original data values of the binning input variables for further analysis. OPTIMAL BINNING is useful for reducing the number of distinct values in the given binning input variables.

Options

Methods. The OPTIMAL BINNING procedure offers the following methods of discretizing binning input variables.

  • Unsupervised binning via the equal frequency algorithm discretizes the binning input variables. A guide variable is not required.
  • Supervised binning via the MDLP (Minimal Description Length Principle) algorithm discretizes the binning input variables without any preprocessing. It is suitable for datasets with a small number of cases. A guide variable is required.

Output. The OPTIMAL BINNING procedure displays every binning input variable’s end point set in pivot table output and offers an option for suppressing this output. In addition, the procedure can save new binned variables corresponding to the binning input variables and can save a command syntax file with commands corresponding to the binning rules.

Basic Specification

The basic specification is the OPTIMAL BINNING command and a VARIABLES subcommand. VARIABLES provides the binning input variables and, if applicable, the guide variable.

  • For unsupervised binning via the equal frequency algorithm, a guide variable is not required.
  • For supervised binning via the MDLP algorithm and hybrid binning, a guide variable must be specified.

Syntax Rules

  • When a supervised binning method is used, a guide variable must be specified on the VARIABLES subcommand.
  • Subcommands may be specified only once.
  • An error occurs if a variable or keyword is specified more than once within a subcommand.
  • Parentheses, slashes, and equals signs shown in the syntax chart are required.
  • Empty subcommands are not honored.
  • The command name, subcommand names, and keywords must be spelled in full.

Case Frequency

  • If a WEIGHT variable is specified, then its values are used as frequency weights by the OPTIMAL BINNING procedure.
  • Weight values are rounded to the nearest whole numbers before use. For example, 0.5 is rounded to 1, and 2.4 is rounded to 2.
  • The WEIGHT variable may not be specified on any subcommand in the OPTIMAL BINNING procedure.
  • Cases with missing weights or weights less than 0.5 are not used in the analyses.

Limitations

The number of distinct values in a guide variable should be less than or equal to 256, irrespective of the platform on which IBM® SPSS® Statistics is running. If the number is greater than 256, this results in an error.