Automatically Generating Binned Categories
The Make Cutpoints dialog box allows you to auto-generate binned categories based on selected criteria.
To Use the Make Cutpoints Dialog Box
- From the
menus in the Data Editor window choose:
- Select the numeric scale and/or ordinal variables for which you want to create new categorical (binned) variables.
- Click Continue.
- Select (click) a variable in the Scanned Variable List.
- Click Make Cutpoints.
- Select the criteria for generating cutpoints that will define the binned categories.
- Click Apply.
Note: The Make Cutpoints dialog box is not available if you scanned zero cases.
Equal Width Intervals. Generates binned categories of equal width (for example, 1–10, 11–20, and 21–30) based on any two of the following three criteria:
- First Cutpoint Location. The value that defines the upper end of the lowest binned category (for example, a value of 10 indicates a range that includes all values up to 10).
- Number of Cutpoints. The number of binned categories is the number of cutpoints plus one. For example, 9 cutpoints generate 10 binned categories.
- Width. The width of each interval. For example, a value of 10 would bin age in years into 10-year intervals.
Equal Percentiles Based on Scanned Cases. Generates binned categories with an equal number of cases in each bin (using the aempirical algorithm for percentiles), based on either of the following criteria:
- Number of Cutpoints. The number of binned categories is the number of cutpoints plus one. For example, three cutpoints generate four percentile bins (quartiles), each containing 25% of the cases.
- Width (%). Width of each interval, expressed as a percentage of the total number of cases. For example, a value of 33.3 would produce three binned categories (two cutpoints), each containing 33.3% of the cases.
If the source variable contains a relatively small number of distinct values or a large number of cases with the same value, you may get fewer bins than requested. If there are multiple identical values at a cutpoint, they will all go into the same interval; so the actual percentages may not always be exactly equal.
Cutpoints at Mean and Selected Standard Deviations Based on Scanned Cases. Generates binned categories based on the values of the mean and standard deviation of the distribution of the variable.
- If you don't select any of the standard deviation intervals, two binned categories will be created, with the mean as the cutpoint dividing the bins.
- You can select any combination of standard deviation intervals based on one, two, and/or three standard deviations. For example, selecting all three would result in eight binned categories--six bins in one standard deviation intervals and two bins for cases more than three standard deviations above and below the mean.
In a normal distribution, 68% of the cases fall within one standard deviation of the mean; 95%, within two standard deviations; and 99%, within three standard deviations. Creating binned categories based on standard deviations may result in some defined bins outside of the actual data range and even outside of the range of possible data values (for example, a negative salary range).
Note: Calculations of percentiles and standard deviations are based on the scanned cases. If you limit the number of cases scanned, the resulting bins may not contain the proportion of cases that you wanted in those bins, particularly if the data file is sorted by the source variable. For example, if you limit the scan to the first 100 cases of a data file with 1000 cases and the data file is sorted in ascending order of age of respondent, instead of four percentile age bins each containing 25% of the cases, you may find that the first three bins each contain only about 3.3% of the cases, and the last bin contains 90% of the cases.