CRITERIA Subcommand (OPTIMAL BINNING command)

The CRITERIA subcommand specifies bin creation options.

PREPROCESS = EQUALFREQ (BINS=n) | NONE. Preprocessing method when MDLP binning is used. PREPROCESS = EQUALFREQ creates preliminary bins using the equal frequency method before performing MDLP binning. These preliminary bins—rather than the original data values of the binning input variables—are input to the MDLP binning method.

  • EQUALFREQ may be followed by parentheses containing the BINS keyword, an equals sign, and an integer greater than 1. The BINS value serves as a preprocessing threshold and specifies the number of bins to create. The default value is EQUALFREQ (BINS = 1000).
  • If the number of distinct values in a binning input variable is greater than the BINS value, then the number of bins created is no more than the BINS value. Otherwise, no preprocessing is done for the input variable.
  • NONE requests no preprocessing.

METHOD = MDLP | EQUALFREQ (BINS=n). Binning method. The MDLP option performs supervised binning via the MDLP algorithm. If METHOD = MDLP is specified, then a guide variable must be specified on the VARIABLES subcommand.

  • Alternatively, METHOD = EQUALFREQ performs unsupervised binning via the equal frequency algorithm. EQUALFREQ may be followed by parentheses containing the BINS keyword, an equals sign, and an integer greater than 1. The BINS value specifies the number of bins to create. The default value of the BINS argument is 10.
  • If the number of distinct values in a binning input variable is greater than the BINS value, then the number of bins created is no more than the BINS value. Otherwise, BINS gives an upper bound on the number of bins created. Thus, for example, if BINS = 10 is specified but a binning input variable has at most 10 distinct values, then the number of bins created will equal the number of distinct values in the input variable.
  • If EQUALFREQ is specified, then the VARIABLES subcommand GUIDE keyword and the CRITERIA subcommand PREPROCESS keyword are silently ignored.
  • The default METHOD option depends on the presence of a GUIDE specification on the VARIABLES subcommand. If GUIDE is specified, then METHOD = MDLP is the default. If GUIDE is not specified, then METHOD = EQUALFREQ is the default.

LOWEREND = UNBOUNDED | OBSERVED. Specifies how the minimum end point for each binning input variable is defined. Valid option values are UNBOUNDED or OBSERVED. If UNBOUNDED, then the minimum end point extends to negative infinity. If OBSERVED, then the minimum observed data value is used.

UPPEREND = UNBOUNDED | OBSERVED. Specifies how the maximum end point for each binning input variable is defined. Valid option values are UNBOUNDED or OBSERVED. If UNBOUNDED, then the maximum end point extends to positive infinity. If OBSERVED, then the maximum of the observed data is used.

LOWERLIMIT =INCLUSIVE | EXCLUSIVE. Specifies how the lower limit of an interval is defined. Valid option values are INCLUSIVE or EXCLUSIVE. Suppose the start and end points of an interval are p and q, respectively. If LOWERLIMIT = INCLUSIVE, then the interval contains values greater than or equal to p but less than q. If LOWERLIMIT = EXCLUSIVE, then the interval contains values greater than p and less than or equal to q.

FORCEMERGE = value. Small bins threshold. Occasionally, the procedure may produce bins with very few cases. A bin is merged if the ratio of its size (number of cases) to that of a neighboring bin is smaller than the specified threshold. Larger thresholds tend to result in more merging. The default value of FORCEMERGE is 0; by default, forced merging of very small bins is not performed.