Misclassification Costs

For categorical (nominal, ordinal) dependent variables, misclassification costs allow you to include information about the relative penalty associated with incorrect classification. For example:

  • The cost of denying credit to a creditworthy customer is likely to be different from the cost of extending credit to a customer who then defaults on the loan.
  • The cost of misclassifying an individual with a high risk of heart disease as low risk is probably much higher than the cost of misclassifying a low-risk individual as high-risk.
  • The cost of sending a mass mailing to someone who isn't likely to respond is probably fairly low, while the cost of not sending the mailing to someone who is likely to respond is relatively higher (in terms of lost revenue).

Misclassification Costs and Value Labels

This dialog box is not available unless at least two values of the categorical dependent variable have defined value labels. See the topic To specify value labels for more information.

To Specify Misclassification Costs

This feature requires the Decision Trees option.

  1. From the menus choose:

    Analyze > Classify > Tree...

  2. In the main Decision Tree dialog box, select a categorical (nominal, ordinal) dependent variable with two or more defined value labels.
  3. Click Options.
  4. Click the Misclassification Costs tab.
  5. Click Custom.
  6. Enter one or more misclassification costs in the grid. Values must be non-negative. (Correct classifications, represented on the diagonal, are always 0.)

Fill Matrix. In many instances, you may want costs to be symmetric—that is, the cost of misclassifying A as B is the same as the cost of misclassifying B as A. The following controls can make it easier to specify a symmetric cost matrix:

  • Duplicate Lower Triangle. Copies values in the lower triangle of the matrix (below the diagonal) into the corresponding upper-triangular cells.
  • Duplicate Upper Triangle. Copies values in the upper triangle of the matrix (above the diagonal) into the corresponding lower-triangular cells.
  • Use Average Cell Values. For each cell in each half of the matrix, the two values (upper- and lower-triangular) are averaged and the average replaces both values. For example, if the cost of misclassifying A as B is 1 and the cost of misclassifying B as A is 3, then this control replaces both of those values with the average (1+3)/2 = 2.