Overview (OPTIMAL BINNING command)
The OPTIMAL BINNING
procedure discretizes one or more scale variables (referred to henceforth
as binning input variables)
by distributing the values of each variable into bins. Bins can then
be used instead of the original data values of the binning input variables
for further analysis. OPTIMAL BINNING
is useful for reducing the number of distinct values in the given
binning input variables.
Options
Methods. The OPTIMAL BINNING
procedure
offers the following methods of discretizing binning input variables.
- Unsupervised binning via the equal frequency algorithm discretizes the binning input variables. A guide variable is not required.
- Supervised binning via the MDLP (Minimal Description Length Principle) algorithm discretizes the binning input variables without any preprocessing. It is suitable for datasets with a small number of cases. A guide variable is required.
Output. The OPTIMAL BINNING
procedure displays every
binning input variable’s end point set in pivot table output
and offers an option for suppressing this output. In addition, the
procedure can save new binned variables corresponding to the binning
input variables and can save a command syntax file with commands corresponding
to the binning rules.
Basic Specification
The basic specification is the OPTIMAL
BINNING
command and a VARIABLES
subcommand. VARIABLES
provides
the binning input variables and, if applicable, the guide variable.
- For unsupervised binning via the equal frequency algorithm, a guide variable is not required.
- For supervised binning via the MDLP algorithm and hybrid binning, a guide variable must be specified.
Syntax Rules
- When a supervised binning method is used, a guide
variable must be specified on the
VARIABLES
subcommand. - Subcommands may be specified only once.
- An error occurs if a variable or keyword is specified more than once within a subcommand.
- Parentheses, slashes, and equals signs shown in the syntax chart are required.
- Empty subcommands are not honored.
- The command name, subcommand names, and keywords must be spelled in full.
Case Frequency
- If a
WEIGHT
variable is specified, then its values are used as frequency weights by theOPTIMAL BINNING
procedure. - Weight values are rounded to the nearest whole numbers before use. For example, 0.5 is rounded to 1, and 2.4 is rounded to 2.
- The
WEIGHT
variable may not be specified on any subcommand in theOPTIMAL BINNING
procedure. - Cases with missing weights or weights less than 0.5 are not used in the analyses.
Limitations
The number of distinct values in a guide variable should be less than or equal to 256, irrespective of the platform on which IBM® SPSS® Statistics is running. If the number is greater than 256, this results in an error.