Selection of Transformation Type

Each variable can be analyzed at one of several different levels. However, because prediction of the response is the goal, you should scale the response “as is” by employing the numerical optimal scaling level. Consequently, the order and the differences between categories will be preserved in the transformed variable.

  1. To run a Categorical Regression analysis, from the menus choose:

    Analyze > Regression > Optimal Scaling (CATREG)...

    Figure 1. Categorical Regression dialog
    Categorical Regression dialog
  2. Select Daily ozone level as the dependent variable.
  3. Select Inversion base height through Day of the year (when variables are listed in file order) as independent variables.
  4. Select Daily ozone level and click Define Scale.
    Figure 2. Define Scale dialog
    The Categorical Regression Define Scale dialog.
  5. In the Define Scale dialog, select Numeric as the optimal scaling level.
  6. Click Continue.
  7. Select Inversion base height through Day of the year, and click Define Scale in the Categorical Regression dialog.
  8. In the Define Scale dialog, select Nominal as the optimal scaling level.
  9. Click Continue.
  10. Click Discretize in the Categorical Regression dialog.
    Figure 3. Discretization dialog
    The Categorical Regression Discretization dialog.
  11. Select ibh.
  12. Select Equal intervals and type 100 as the interval length.
  13. Click Change.
  14. Select dpg, vis, and doy.
  15. Type 10 as the interval length.
  16. Click Change.
  17. Select temp.
  18. Type 1.8 as the interval length.
  19. Click Change.
  20. Click Continue.
  21. Click Plots in the Categorical Regression dialog.
    Figure 4. Plots dialog
    The Categorical Regression Plots dialog.
  22. In the Plots dialog, select transformation plots for Inversion base height through Day of the year.
  23. Click Continue.
  24. Click OK in the Categorical Regression dialog.
    Figure 5. Model summary
    Table showing multiple R, R-square, adjusted R-square, apparent prediction error

    Treating all predictors as nominal yields an R 2 of 0.880. This large amount of variance accounted for is not surprising because nominal treatment imposes no restrictions on the quantifications. However, interpreting the results can be quite difficult.

    Figure 6. Regression coefficients (all predictors nominal)
    Table showing predictors in the rows and standardized coefficients, degrees of freedom, F, and significance in the columns

    This table shows the standardized regression coefficients of the predictors. A common mistake made when interpreting these values involves focusing on the coefficients while neglecting the quantifications. You cannot simply assert that a positive value of Inversion base height, for example, implies that as the predictor increases, predicted Ozone increases. All interpretations must be relative to the transformed variables, so that as the quantifications for Inversion base height increase, predicted Ozone increases. To examine the effects of the original variables, you must relate the categories to the quantifications.

    Figure 7. Transformation plot of Inversion base height (nominal)
    Transformation plot of Inversion base height (nominal)

    The transformation plot of Inversion base height shows no apparent pattern. As evidenced by the jagged nature of the plot, moving from low categories to high categories yields fluctuations in the quantifications in both directions. Thus, describing the effects of this variable requires focusing on the individual categories. Imposing ordinal or linear restrictions on the quantifications for this variable might significantly reduce the fit.

    Figure 8. Transformation plot of Pressure gradient (nominal)
    Transformation plot of Pressure gradient (nominal)

    This figure displays the transformation plot of Pressure gradient. The initial discretized categories (1 through 6) receive small quantifications and thus have minimal contributions to the predicted response. The next three categories receive somewhat higher, positive values, resulting in a moderate increase in predicted ozone.

    The quantifications decrease up to category 16, where Pressure gradient has its greatest decreasing effect on predicted ozone. Although the line increases after this category, using an ordinal scaling level for Pressure gradient may not significantly reduce the fit, while simplifying the interpretations of the effects. However, the importance measure of 0.04 and the regression coefficient for Pressure gradient indicates that this variable is not very useful in the regression.

    Figure 9. Transformation plot of Visibility (nominal)
    Transformation plot of Visibility (nominal)

    The transformation plot of Visibility, like that for Inversion base height, shows no apparent pattern. Imposing ordinal or linear restrictions on the quantifications for this variable might significantly reduce the fit.

    Figure 10. Transformation plot of Temperature (nominal)
    Transformation plot of Temperature (nominal)

    The transformation plot of Temperature displays an alternative pattern. As the categories increase, the quantifications tend to increase. As a result, as Temperature increases, predicted ozone tends to increase. This pattern suggests scaling Temperature at the ordinal level.

    Figure 11. Transformation plot of Day of the year (nominal)
    Transformation plot of Day of the year (nominal)

    This figure shows the transformation plot of Day of the year. The quantifications tend to increase up to the midpoint of the graph, at which point they tend to decrease, yielding an inverted U-shape. Considering the sign of the regression coefficient for Day of the year, the initial categories receive quantifications that have a decreasing effect on predicted ozone. For the middle categories, the effect of the quantifications on predicted ozone increases, reaching a maximum around the midpoint of the graph.

    Beyond that point, the quantifications tend to decrease the predicted ozone. Although the line is quite jagged, the general shape is still identifiable. Thus, the transformation plots suggest scaling Temperature at the ordinal level while keeping all other predictors nominally scaled.

    To recompute the regression, scaling Temperature at the ordinal level, recall the Categorical Regression dialog.

  25. Select Temperature and click Define Scale.
  26. In the Define Scale dialog, select Ordinal as the optimal scaling level.
  27. Click Continue.
  28. Click Save in the Categorical Regression dialog.
  29. In the Save dialog, select Save transformed variables to the active dataset in the Transformed Variables group.
  30. Click Continue.
  31. Click OK in the Categorical Regression dialog.
Figure 12. Model summary for regression with Temperature (ordinal)
Table showing multiple R, R-square, adjusted R-square, apparent prediction error

This model results in an R 2 of 0.872, so the variance accounted for decreases negligibly when the quantifications for Temperature are restricted to be ordered.

Figure 13. Regression coefficients with Temperature (ordinal)
Tables showing predictors in the rows and standardized coefficients, degrees of freedom, F, and significance in the columns

This table displays the coefficients for the model in which Temperature is scaled as ordinal. Comparing the coefficients to those for the model in which Temperature is scaled as nominal, no large changes occur.

Figure 14. Correlations, importance, and tolerance
Table showing predictors in the rows and zero-order, part, and partial correlations, importance and tolerance in the columns

Moreover, the importance measures suggest that Temperature is still much more important to the regression than the other variables. Now, however, as a result of the ordinal scaling level of Temperature and the positive regression coefficient, you can assert that as Temperature increases, predicted ozone increases.

Figure 15. Transformation plot of Temperature (ordinal)
Transformation plot of Temperature (ordinal)

The transformation plot illustrates the ordinal restriction on the quantifications for Temperature. The jagged line from the nominal transformation is replaced here by a smooth ascending line. Moreover, no long plateaus are present, indicating that collapsing categories is not needed.

Next