Optimality of the Quantifications

The transformed variables from a categorical regression can be used in a standard linear regression, yielding identical results. However, the quantifications are optimal only for the model that produced them. Using a subset of the predictors in linear regression does not correspond to an optimal scaling regression on the same subset.

For example, the categorical regression that you have computed has an R 2 of 0.875. You have saved the transformed variables, so in order to fit a linear regression using only Temperature, Pressure gradient, and Inversion base height as predictors, from the menus choose:

Analyze > Regression > Linear...

Figure 1. Linear Regression dialog
Linear Regression dialog
  1. In the Linear Regression dialog, select Daily ozone level Quantification as the dependent variable.
  2. Select Inversion base height Quantification, Pressure gradient (mm Hg) Quantification, and Temperature (degrees F) Quantification as independent variables.
  3. Click OK.
    Figure 2. Model summary for regression with subset of optimally scaled predictors
    Table showing R, R-square, adjusted R-square, and standard error of the estimate

    Using the quantifications for the response, Temperature, Pressure gradient, and Inversion base height in a standard linear regression results in a fit of 0.732. To compare this to the fit of a categorical regression using just those three predictors, recall the Categorical Regression dialog.

    Figure 3. Categorical Regression dialog
    Categorical Regression dialog
  4. In the Categorical Regression dialog, deselect Visibility (miles) and Day of the year as independent variables.
  5. Click OK.
Figure 4. Model summary for categorical regression on three predictors
Table showing multiple R, R-square, adjusted R-square, apparent prediction error

The categorical regression analysis has a fit of 0.796, which is better than the fit of 0.732. This demonstrates the property of the scalings that the quantifications obtained in the original regression are only optimal when all five variables are included in the model.

Next