SUBSET Subcommand (NAIVEBAYES command)

The SUBSET subcommand gives settings for the subset of selected predictors.

  • There are three mutually exclusive settings: (1) specify a maximum subset size and a method of selecting the best subset, (2) specify an exact subset size, or (3) do not specify a selection.
  • Only one of the keywords MAXSIZE, EXACTSIZE, or NOSELECTION may be specified. The BESTSUBSET option is available only if MAXSIZE is specified.

MAXSIZE Keyword

The MAXSIZE keyword specifies the maximum subset size to use when creating the sequence of predictor subsets. The MAXSIZE value is the size of the largest subset beyond any predictors that were forced via the FORCE subcommand. If no predictors are forced, the MAXSIZE value is simply the size of the largest subset.

  • Value AUTO indicates that the number should be computed automatically. Alternatively, a positive integer may be specified. The integer must be less than or equal to the number of unique predictors on the NAIVEBAYES command.
  • By default, MAXSIZE is used and AUTO is the default value.

BESTSUBSET Keyword

The BESTSUBSET keyword indicates the criterion for finding the best subset when a maximum subset size is used.

  • This keyword is honored only if the MAXSIZE keyword is in effect and must be given in parentheses immediately following the MAXSIZE specification.

PSEUDOBIC. Use the pseudo-BIC criterion. The pseudo-BIC criterion is based on the training sample. If the active dataset is not partitioned into training and test samples, PSEUDOBIC is the default. If the active dataset is partitioned, PSEUDOBIC is available but is not the default.

TESTDATA. Use the test data criterion. The test data criterion is based on the test sample. If the active dataset is partitioned into training and test samples, TESTDATA is the default. If the active dataset is not partitioned, TESTDATA may not be specified.

EXACTSIZE Keyword

The EXACTSIZE keyword specifies a particular subset size to use. The EXACTSIZE value is the size of the subset beyond any predictors forced via the FORCE subcommand. If no predictors are forced, then the EXACTSIZE value is simply the size of the subset.

  • A positive integer may be specified. The integer must be less than the number of unique predictors on the NAIVEBAYES command.
  • There is no default value.

NOSELECTION Keyword

The NOSELECTION keyword indicates that all predictors that are specified on the NAIVEBAYES command—excluding any predictors that are also specified on the EXCEPT subcommand—are included in the final subset. This specification is useful if the NAIVEBAYES procedure is used for model building but not predictor selection.