For CRT and QUEST trees with categorical dependent variables, you can specify prior probabilities of group membership. Prior probabilities are estimates of the overall relative frequency for each category of the dependent variable prior to knowing anything about the values of the independent (predictor) variables. Using prior probabilities helps to correct any tree growth caused by data in the sample that is not representative of the entire population.
Obtain from training sample (empirical priors). Use this setting if the distribution of dependent variable values in the data file is representative of the population distribution. If you are using split-sample validation, the distribution of cases in the training sample is used.
Note: Since cases are randomly assigned to the training sample in split-sample validation, you won't know the actual distribution of cases in the training sample in advance. See the topic Validation for more information.
Equal across categories. Use this setting if categories of the dependent variable are represented equally in the population. For example, if there are four categories, approximately 25% of the cases are in each category.
Custom. Enter a non-negative value for each category of the dependent variable listed in the grid. The values can be proportions, percentages, frequency counts, or any other values that represent the distribution of values across categories.
Adjust priors using misclassification costs. If you define custom misclassification costs, you can adjust prior probabilities based on those costs. See the topic Misclassification Costs for more information.
Profits and Value Labels
This dialog box requires defined value labels for the dependent variable. It is not available unless at least two values of the categorical dependent variable have defined value labels. See the topic To specify value labels for more information.
To Specify Prior Probabilities
This feature requires the Decision Trees option.
- From the
- In the main Decision Tree dialog box, select a categorical (nominal, ordinal) dependent variable with two or more defined value labels.
- For the growing method, select CRT or QUEST.
- Click Options.
- Click the Prior Probabilities tab.