GROUPED subcommand (FREQUENCIES command)

When the values of a variable represent grouped or collapsed data, it is possible to estimate percentiles for the original, ungrouped data from the grouped data. The GROUPED subcommand specifies which variables have been grouped. It affects only the output from the PERCENTILES and NTILES subcommands and the MEDIAN statistic from the STATISTICS subcommand.

  • Multiple GROUPED subcommands can be used on a single FREQUENCIES command. Multiple variable lists, separated by slashes, can appear on a single GROUPED subcommand.
  • The variables named on GROUPED must have been named on the VARIABLES subcommand.
  • The value or value list in the parentheses is optional. When it is omitted, the program treats the values of the variables listed on GROUPED as midpoints. If the values are not midpoints, they must first be recoded with the RECODE command.
  • A single value in parentheses specifies the width of each grouped interval. The data values must be group midpoints, but there can be empty categories. For example, if you have data values of 10, 20, and 30 and specify an interval width of 5, the categories are 10 ± 2.5, 20 ± 2.5, and 30 ± 2.5. The categories 15 ± 2.5 and 25 ± 2.5 are empty.
  • A value list in the parentheses specifies interval boundaries. The data values do not have to represent midpoints, but the lowest boundary must be lower than any value in the data. If any data values exceed the highest boundary specified (the last value within the parentheses), they will be assigned to an open-ended interval. In this case, some percentiles cannot be calculated.

Basic Example

RECODE AGE (1=15) (2=25) (3=35) (4=45) (5=55)
           (6=65) (7=75) (8=85) (9=95)
  /INCOME  (1=5)  (2=15) (3=25) (4=35) (5=45)
           (6=55) (7=65) (8=75) (9=100).

FREQUENCIES VARIABLES=AGE, SEX, RACE, INCOME
  /GROUPED=AGE, INCOME 
  /PERCENTILES=5,25,50,75,95.
  • The AGE and INCOME categories of 1, 2, 3, and so forth are recoded to category midpoints. Note that data can be recoded to category midpoints on any scale; here AGE is recoded in years, but INCOME is recoded in thousands of dollars.
  • The GROUPED subcommand on FREQUENCIES allows more accurate estimates of the requested percentiles.

Specifying the Width of Each Grouped Interval

FREQUENCIES VARIABLES=TEMP
  /GROUPED=TEMP (0.5)
  /NTILES=10.
  • The values of TEMP (temperature) in this example were recorded using an inexpensive thermometer whose readings are precise only to the nearest half degree.
  • The observed values of 97.5, 98, 98.5, 99, and so on, are treated as group midpoints, smoothing out the discrete distribution. This yields more accurate estimates of the deciles.

Specifying Interval Boundaries

FREQUENCIES VARIABLES=AGE
  /GROUPED=AGE (17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5
                52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5)
  /PERCENTILES=5, 10, 25, 50, 75, 90, 95.
  • The values of AGE in this example have been estimated to the nearest five years. The first category is 17.5 to 22.5, the second is 22.5 to 27.5, and so forth. The artificial clustering of age estimates at multiples of five years is smoothed out by treating AGE as grouped data.
  • It is not necessary to recode the ages to category midpoints, since the interval boundaries are explicitly given.