Creating a Categorical Variable from a Scale Variable

Several categorical variables in the data file demo.sav are, in fact, derived from scale variables in that data file. For example, the variable inccat is simply income grouped into four categories. This categorical variable uses the integer values 1–4 to represent the following income categories (in thousands): less than $25, $25–$49, $50–$74, and $75 or higher.

To create the categorical variable inccat:

  1. From the menus in the Data Editor window choose:

    Transform > Visual Binning...

    In the initial Visual Binning dialog box, you select the scale and/or ordinal variables for which you want to create new, binned variables. Binning means taking two or more contiguous values and grouping them into the same category.

    Since Visual Binning relies on actual values in the data file to help you make good binning choices, it needs to read the data file first. Since this can take some time if your data file contains a large number of cases, this initial dialog box also allows you to limit the number of cases to read ("scan"). This is not necessary for our sample data file. Even though it contains more than 6,000 cases, it does not take long to scan that number of cases.

  2. Drag and drop Household income in thousands [income] from the Variables list into the Variables to Bin list, and then click Continue.
    Figure 1. Main Visual Binning dialog box
    Main Visual Binning dialog box
  3. In the main Visual Binning dialog box, select Household income in thousands [income] in the Scanned Variable List.

    A histogram displays the distribution of the selected variable (which in this case is highly skewed).

  4. Enter inccat2 for the new binned variable name and Income category [in thousands] for the variable label.
  5. Click Make Cutpoints.
  6. Select Equal Width Intervals.
  7. Enter 25 for the first cutpoint location, 3 for the number of cutpoints, and 25 for the width.

    The number of binned categories is one greater than the number of cutpoints. So in this example, the new binned variable will have four categories, with the first three categories each containing ranges of 25 (thousand) and the last one containing all values above the highest cutpoint value of 75 (thousand).

  8. Click Apply.

    The values now displayed in the grid represent the defined cutpoints, which are the upper endpoints of each category. Vertical lines in the histogram also indicate the locations of the cutpoints.

    By default, these cutpoint values are included in the corresponding categories. For example, the first value of 25 would include all values less than or equal to 25. But in this example, we want categories that correspond to less than 25, 25–49, 50–74, and 75 or higher.

  9. In the Upper Endpoints group, select Excluded (<).
  10. Then click Make Labels.
    Figure 2. Automatically generated value labels
    Visual Binning dialog with automatically generated labels

    This automatically generates descriptive value labels for each category. Since the actual values assigned to the new binned variable are simply sequential integers starting with 1, the value labels can be very useful.

    You can also manually enter or change cutpoints and labels in the grid, change cutpoint locations by dragging and dropping the cutpoint lines in the histogram, and delete cutpoints by dragging cutpoint lines off of the histogram.

  11. Click OK to create the new, binned variable.

The new variable is displayed in the Data Editor. Since the variable is added to the end of the file, it is displayed in the far right column in Data View and in the last row in Variable View.