Effects of value labels on tree models

Figure 1. Tree with both variables treated as nominal
Tree with both variables treated as nominal

The Decision Tree dialog box interface assumes that either all nonmissing values of a categorical (nominal, ordinal) dependent variable have defined value labels or none of them do. Some features are not available unless at least two nonmissing values of the categorical dependent variable have value labels. If at least two nonmissing values have defined value labels, any cases with other values that do not have value labels will be excluded from the analysis.

The original data file in this example contains no defined value labels, and when the dependent variable is treated as nominal, the tree model uses all nonmissing values in the analysis. In this example, those values are 1, 2, and 3.

But what happens when we define value labels for some, but not all, values of the dependent variable?

  1. In the Data Editor window, click the Variable View tab.
  2. Click the Values cell for the variable dependent.
    Figure 2. Defining value labels for dependent variable
    Defining value labels for dependent variable
  3. First, enter 1 for Value and Yes for Value Label, and then click Add.
  4. Next, enter 2 for Value and No for Value Label, and then click Add again.
  5. Then click OK.
  6. Open the Decision Tree dialog box again. The dialog box should still have dependent selected as the dependent variable, with a nominal measurement level.
  7. Click OK to run the procedure again.
Figure 3. Tree for nominal dependent variable with partial value labels
Tree for nominal dependent variable with partial value labels

Now only the two dependent variable values with defined value labels are included in the tree model. All cases with a value of 3 for the dependent variable have been excluded, which might not be readily apparent if you aren't familiar with the data.

Next