GROUPED subcommand (FREQUENCIES command)
When the values of a variable represent grouped
or collapsed data, it is possible to estimate percentiles for the
original, ungrouped data from the grouped data. The GROUPED subcommand specifies which variables
have been grouped. It affects only the output from the PERCENTILES and NTILES subcommands and the MEDIAN statistic from the STATISTICS subcommand.
- Multiple
GROUPEDsubcommands can be used on a singleFREQUENCIEScommand. Multiple variable lists, separated by slashes, can appear on a singleGROUPEDsubcommand. - The variables named on
GROUPEDmust have been named on theVARIABLESsubcommand. - The value or value list in the parentheses is optional.
When it is omitted, the program treats the values of the variables
listed on
GROUPEDas midpoints. If the values are not midpoints, they must first be recoded with theRECODEcommand. - A single value in parentheses specifies the width of each grouped interval. The data values must be group midpoints, but there can be empty categories. For example, if you have data values of 10, 20, and 30 and specify an interval width of 5, the categories are 10 ± 2.5, 20 ± 2.5, and 30 ± 2.5. The categories 15 ± 2.5 and 25 ± 2.5 are empty.
- A value list in the parentheses specifies interval boundaries. The data values do not have to represent midpoints, but the lowest boundary must be lower than any value in the data. If any data values exceed the highest boundary specified (the last value within the parentheses), they will be assigned to an open-ended interval. In this case, some percentiles cannot be calculated.
Basic Example
RECODE AGE (1=15) (2=25) (3=35) (4=45) (5=55)
(6=65) (7=75) (8=85) (9=95)
/INCOME (1=5) (2=15) (3=25) (4=35) (5=45)
(6=55) (7=65) (8=75) (9=100).
FREQUENCIES VARIABLES=AGE, SEX, RACE, INCOME
/GROUPED=AGE, INCOME
/PERCENTILES=5,25,50,75,95.
- The AGE and INCOME categories of 1, 2, 3, and so forth are recoded to category midpoints. Note that data can be recoded to category midpoints on any scale; here AGE is recoded in years, but INCOME is recoded in thousands of dollars.
- The
GROUPEDsubcommand onFREQUENCIESallows more accurate estimates of the requested percentiles.
Specifying the Width of Each Grouped Interval
FREQUENCIES VARIABLES=TEMP
/GROUPED=TEMP (0.5)
/NTILES=10.
- The values of TEMP (temperature) in this example were recorded using an inexpensive thermometer whose readings are precise only to the nearest half degree.
- The observed values of 97.5, 98, 98.5, 99, and so on, are treated as group midpoints, smoothing out the discrete distribution. This yields more accurate estimates of the deciles.
Specifying Interval Boundaries
FREQUENCIES VARIABLES=AGE
/GROUPED=AGE (17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5
52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5)
/PERCENTILES=5, 10, 25, 50, 75, 90, 95.
- The values of AGE in this example have been estimated to the nearest five years. The first category is 17.5 to 22.5, the second is 22.5 to 27.5, and so forth. The artificial clustering of age estimates at multiples of five years is smoothed out by treating AGE as grouped data.
- It is not necessary to recode the ages to category midpoints, since the interval boundaries are explicitly given.