Discretizing Variables

If a variable has more categories than is practically interpretable, you should modify the categories using the Discretization dialog to reduce the category range to a more manageable number.

The variable Day of the year has a minimum value of 3 and a maximum value of 365. Using this variable in a categorical regression corresponds to using a variable with 365 categories. Similarly, Visibility (miles) ranges from 0 to 350. To simplify interpretation of analyses, discretize these variables into equal intervals of length 10.

The variable Inversion base height ranges from 111 to 5000. A variable with this many categories results in very complex relationships. However, discretizing this variable into equal intervals of length 100 yields roughly 50 categories. Using a 50-category variable rather than a 5000-category variable simplifies interpretations significantly.

Pressure gradient (mm Hg) ranges from –69 to 107. The procedure omits any categories coded with negative numbers from the analysis, but discretizing this variable into equal intervals of length 10 yields roughly 19 categories.

Temperature (degrees F) ranges from 25 to 93 on the Fahrenheit scale. In order to analyze the data as if it were on the Celsius scale, discretize this variable into equal intervals of length 1.8.

Different discretizations for variables may be desired. The choices used here are purely subjective. If you desire fewer categories, choose larger intervals. For example, Day of the year could have been divided into months of the year or seasons.

Next