PRIORS Subcommand (TREE command)
For CRT and QUEST trees with categorical dependent
variables, the PRIORS
subcommand
allows you to specify prior probabilities of group membership. Prior probabilities are estimates of
the overall relative frequency for each target category of the dependent
variable prior to knowing anything about the values of the independent
(predictor) variables. Using prior probabilities helps correct any
tree growth caused by data in the sample that is not representative
of the entire population.
- If the growing method is CHAID or Exhaustive CHAID, this subcommand is ignored and a warning is issued.
- If the dependent variable is scale, this subcommand is ignored and a warning is issued.
FROMDATA. Obtain priors
from the training sample. Use this setting if the distribution
of groups in the training sample is representative of the population
distribution. This is the default. If you have not specified a training
sample using the VALIDATION
subcommand,
then the distribution of values in the entire data file is used.
EQUAL. Equal priors across categories. Use this if categories of the dependent variable are represented equally in the population. For example, if there are four categories, approximately 25% of the cases are in each category.
CUSTOM. User-specified prior probabilities. For each category of the dependent variable, specify the category followed by a non-negative prior probability value enclosed in square brackets.
- Prior probabilities must be specified for all values
of the dependent variable included in the analysis (either all non-missing
values found in the data or all values defined on the
DEPCATEGORIES
subcommand). - The values must be non-negative.
- The specified category values must be consistent with the data type of the dependent variable. String and date values must be quoted. Date values must be consistent with the variable’s print format.
- If you specify the same category more than once, the last one is used.
- A warning is issued if you specify a prior probability for a category that does not exist in the data or in the training sample if split-sample validation is in effect. See the topic VALIDATION Subcommand (TREE command) for more information.
- Prior probability values are "normalized" to relative proportions before the tree is grown.
Example
TREE risk [o] BY age income employment
/METHOD TYPE=CRT
/PRIORS CUSTOM= 1 [30] 2 [75] 3 [45].
The prior probabilities of 30, 75, and 45 are normalized to proportions of 0.2, 0.5, and 0.3, respectively.
ADJUST Keyword
The ADJUST
keyword specifies whether prior probabilities are adjusted to take
into account misclassification costs.
NO. Priors are unadjusted. This is the default.
YES. Priors are adjusted to take into account misclassification costs.