Adding Group Percentile Values to a Data File

Using the AGGREGATE command, you can compute various group summary statistics and then include those values in the active dataset as new variables. For example, you could compute mean, minimum, and maximum income by job category and then include those values in the dataset. Some summary statistics, however, are not available with the AGGREGATE command. This example uses OMS to write a table of group percentiles to a data file and then merges the data in that file with the original data file.

The command syntax used in this example is oms_percentiles.sps. See the topic Sample Files for more information.

GET
  FILE='Employee data.sav'.
PRESERVE.
SET TVARS NAMES TNUMBERS VALUES.
DATASET DECLARE freq_table.

***split file by job category to get group percentiles.
SORT CASES BY jobcat.
SPLIT FILE LAYERED BY jobcat. 

OMS 
  /SELECT TABLES
  /IF COMMANDS=['Frequencies'] SUBTYPES=['Statistics']
  /DESTINATION FORMAT=SAV
   OUTFILE='freq_table'
  /COLUMNS SEQUENCE=[L1 R2].

FREQUENCIES
  VARIABLES=salary   
  /FORMAT=NOTABLE
  /PERCENTILES= 25 50 75.

OMSEND.

***restore previous  SET settings.
RESTORE.

MATCH FILES FILE=*
  /TABLE='freq_table'
  /rename (Var1=jobcat)
  /BY jobcat
 /DROP command_ TO salary_Missing.
EXECUTE.
  • The PRESERVE command saves your current SET command specifications.
  • SET TVARS NAMES TNUMBERS VALUES specifies that variable names and data values, not variable or value labels, should be displayed in tables. Using variable names instead of labels is not technically necessary in this example, but it makes the new variable names constructed from column labels somewhat easier to work with. Using data values instead of value labels, however, is required to make this example work properly because we will use the job category values in the two files to merge them together.
  • SORT CASES and SPLIT FILE are used to divide the data into groups by job category (jobcat). The LAYERED keyword specifies that results for each split-file group should be displayed in the same table rather than in separate tables.
  • The OMS command will select all statistics tables from subsequent FREQUENCIES commands and write the tables to a data file.
  • The COLUMNS subcommand will put the first layer dimension element and the second row dimension element in the columns.
  • The FREQUENCIES command produces a statistics table that contains the 25th, 50th, and 75th percentile values for salary. Since split-file processing is on, the table will contain separate percentile values for each job category.
  • In the statistics table, the variable salary is the only layer dimension element, so the L1 specification in the OMS COLUMNS subcommand will put salary in the column dimension.
  • The table statistics are the second (inner) row dimension element in the table, so the R2 specification in the OMS COLUMNS subcommand will put the statistics in the column dimension, nested under the variable salary.
  • The data values 1, 2, and 3 are used for the categories of the variable jobcat instead of the descriptive text value labels because of the previous SET command specifications.

The following table shows the contents of the resulting dataset.

Table 1. Dataset from frequency table with variables and statistics moved into the column dimension
Command_ Subtype_ Label_ Var1 salary_valid salary_Missing salary_25 salary_50 salary_75
Frequencies Statistics Statistic 1 363 0 22800 26550 31200
Frequencies Statistics Statistic 2 27 0 30000 30750 31200
Frequencies Statistics Statistic 3 84 0 51618 60500 72094
  • OMSEND ends all active OMS commands. Without this, we could not access the dataset freq_table in the subsequent MATCH FILES command because the file would still be open for writing.
  • The MATCH FILES command merges the contents of the dataset created from the statistics table with the original dataset. New variables from the data file created by OMS will be added to the original data file.
  • FILE=* specifies the current active dataset, which is still the original data file.
  • TABLE='freq_table' identifies the dataset created by OMS as a table lookup file. A table lookup file is a file in which data for each "case" can be applied to multiple cases in the other data file(s). In this example, the table lookup file contains only three cases—one for each job category.
  • In the dataset created by OMS, the variable that contains the job category values is named Var1, but in the original data file, the variable is named jobcat. RENAME (Var1=jobcat) compensates for this discrepancy in the variable names.
  • BY jobcat merges the two files together by values of the variable jobcat. The three cases in the table lookup file will be merged with every case in the original data file with the same value for jobcat (also known as Var1 in the table lookup file).
  • Since we don't want to include the three table identifier variables (automatically included in every data file created by OMS) or the two variables that contain information on valid and missing cases, we use the DROP subcommand to omit these from the merged data file.

The end result is three new variables containing the 25th, 50th, and 75th percentile salary values for each job category.