GROUPED subcommand (FREQUENCIES command)
When the values of a variable represent grouped
or collapsed data, it is possible to estimate percentiles for the
original, ungrouped data from the grouped data. The GROUPED
subcommand specifies which variables
have been grouped. It affects only the output from the PERCENTILES
and NTILES
subcommands and the MEDIAN
statistic from the STATISTICS
subcommand.
- Multiple
GROUPED
subcommands can be used on a singleFREQUENCIES
command. Multiple variable lists, separated by slashes, can appear on a singleGROUPED
subcommand. - The variables named on
GROUPED
must have been named on theVARIABLES
subcommand. - The value or value list in the parentheses is optional.
When it is omitted, the program treats the values of the variables
listed on
GROUPED
as midpoints. If the values are not midpoints, they must first be recoded with theRECODE
command. - A single value in parentheses specifies the width of each grouped interval. The data values must be group midpoints, but there can be empty categories. For example, if you have data values of 10, 20, and 30 and specify an interval width of 5, the categories are 10 ± 2.5, 20 ± 2.5, and 30 ± 2.5. The categories 15 ± 2.5 and 25 ± 2.5 are empty.
- A value list in the parentheses specifies interval boundaries. The data values do not have to represent midpoints, but the lowest boundary must be lower than any value in the data. If any data values exceed the highest boundary specified (the last value within the parentheses), they will be assigned to an open-ended interval. In this case, some percentiles cannot be calculated.
Basic Example
RECODE AGE (1=15) (2=25) (3=35) (4=45) (5=55)
(6=65) (7=75) (8=85) (9=95)
/INCOME (1=5) (2=15) (3=25) (4=35) (5=45)
(6=55) (7=65) (8=75) (9=100).
FREQUENCIES VARIABLES=AGE, SEX, RACE, INCOME
/GROUPED=AGE, INCOME
/PERCENTILES=5,25,50,75,95.
- The AGE and INCOME categories of 1, 2, 3, and so forth are recoded to category midpoints. Note that data can be recoded to category midpoints on any scale; here AGE is recoded in years, but INCOME is recoded in thousands of dollars.
- The
GROUPED
subcommand onFREQUENCIES
allows more accurate estimates of the requested percentiles.
Specifying the Width of Each Grouped Interval
FREQUENCIES VARIABLES=TEMP
/GROUPED=TEMP (0.5)
/NTILES=10.
- The values of TEMP (temperature) in this example were recorded using an inexpensive thermometer whose readings are precise only to the nearest half degree.
- The observed values of 97.5, 98, 98.5, 99, and so on, are treated as group midpoints, smoothing out the discrete distribution. This yields more accurate estimates of the deciles.
Specifying Interval Boundaries
FREQUENCIES VARIABLES=AGE
/GROUPED=AGE (17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5
52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5)
/PERCENTILES=5, 10, 25, 50, 75, 90, 95.
- The values of AGE in this example have been estimated to the nearest five years. The first category is 17.5 to 22.5, the second is 22.5 to 27.5, and so forth. The artificial clustering of age estimates at multiples of five years is smoothed out by treating AGE as grouped data.
- It is not necessary to recode the ages to category midpoints, since the interval boundaries are explicitly given.