PRIORS Subcommand (TREE command)

For CRT and QUEST trees with categorical dependent variables, the PRIORS subcommand allows you to specify prior probabilities of group membership. Prior probabilities are estimates of the overall relative frequency for each target category of the dependent variable prior to knowing anything about the values of the independent (predictor) variables. Using prior probabilities helps correct any tree growth caused by data in the sample that is not representative of the entire population.

  • If the growing method is CHAID or Exhaustive CHAID, this subcommand is ignored and a warning is issued.
  • If the dependent variable is scale, this subcommand is ignored and a warning is issued.

FROMDATA. Obtain priors from the training sample. Use this setting if the distribution of groups in the training sample is representative of the population distribution. This is the default. If you have not specified a training sample using the VALIDATION subcommand, then the distribution of values in the entire data file is used.

EQUAL. Equal priors across categories. Use this if categories of the dependent variable are represented equally in the population. For example, if there are four categories, approximately 25% of the cases are in each category.

CUSTOM. User-specified prior probabilities. For each category of the dependent variable, specify the category followed by a non-negative prior probability value enclosed in square brackets.

  • Prior probabilities must be specified for all values of the dependent variable included in the analysis (either all non-missing values found in the data or all values defined on the DEPCATEGORIES subcommand).
  • The values must be non-negative.
  • The specified category values must be consistent with the data type of the dependent variable. String and date values must be quoted. Date values must be consistent with the variable’s print format.
  • If you specify the same category more than once, the last one is used.
  • A warning is issued if you specify a prior probability for a category that does not exist in the data or in the training sample if split-sample validation is in effect. See the topic VALIDATION Subcommand (TREE command) for more information.
  • Prior probability values are "normalized" to relative proportions before the tree is grown.

Example

TREE risk [o] BY age income employment
 /METHOD TYPE=CRT
 /PRIORS CUSTOM= 1 [30] 2 [75] 3 [45].

The prior probabilities of 30, 75, and 45 are normalized to proportions of 0.2, 0.5, and 0.3, respectively.

ADJUST Keyword

The ADJUST keyword specifies whether prior probabilities are adjusted to take into account misclassification costs.

NO. Priors are unadjusted. This is the default.

YES. Priors are adjusted to take into account misclassification costs.