Building a Model on the Prepared Data

- To build the model on the prepared data, recall the GLM Univariate dialog.
- Deselect Sales in thousands [sales] and select sales_transformed as the dependent variable.
- Deselect 4-year resale value [resale] through Fuel efficiency [mpg] and select resale_transformed through mpg_transformed as covariates.
- Click OK.

There are a few interesting differences to note in the between-subjects effects for the model built on the unprepared data and the model built on the prepared data. First, note that the total degrees of freedom has increased; this is due to the fact that missing values were replaced with imputed values during automated data preparation, so records that were listwise removed from the first model are available to the second. More notably, perhaps, the significance of certain predictors has changed. While both models agree that the engine size [engine_s] and vehicle type [type] are useful to the model, the wheelbase [wheelbas] and curb weight [curb_wgt] are no longer significant, and the vehicle price [price_transformed] and fuel efficiency [mpg_transformed] are now significant.
Why did this change occur? Sales has a skewed distribution, so it could be that wheelbase and curb weight had a few influential records that were no longer influential once sales was transformed. Another possibility is that the extra cases available due to missing value replacement changed the statistical significance of these variables. In any case, this would require further investigation that we will not pursue here.
Note that the R Squared is higher for the model built on the prepared data, but because sales has been transformed, this may not be the best measure for comparing each model's performance. Instead, you can compute the nonparametric correlations between the observed values and the two sets of predicted values.