Adding Group Percentile Values to a Data File
Using the AGGREGATE
command, you can compute various group summary statistics and then
include those values in the active dataset as new variables. For example,
you could compute mean, minimum, and maximum income by job category
and then include those values in the dataset. Some summary statistics,
however, are not available with the AGGREGATE
command. This example uses OMS
to write a table of group percentiles to a data file and then merges
the data in that file with the original data file.
The command syntax used in this example is oms_percentiles.sps. See the topic Sample Files for more information.
GET
FILE='Employee data.sav'.
PRESERVE.
SET TVARS NAMES TNUMBERS VALUES.
DATASET DECLARE freq_table.
***split file by job category to get group percentiles.
SORT CASES BY jobcat.
SPLIT FILE LAYERED BY jobcat.
OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Statistics']
/DESTINATION FORMAT=SAV
OUTFILE='freq_table'
/COLUMNS SEQUENCE=[L1 R2].
FREQUENCIES
VARIABLES=salary
/FORMAT=NOTABLE
/PERCENTILES= 25 50 75.
OMSEND.
***restore previous SET settings.
RESTORE.
MATCH FILES FILE=*
/TABLE='freq_table'
/rename (Var1=jobcat)
/BY jobcat
/DROP command_ TO salary_Missing.
EXECUTE.
- The
PRESERVE
command saves your currentSET
command specifications. -
SET TVARS NAMES TNUMBERS VALUES
specifies that variable names and data values, not variable or value labels, should be displayed in tables. Using variable names instead of labels is not technically necessary in this example, but it makes the new variable names constructed from column labels somewhat easier to work with. Using data values instead of value labels, however, is required to make this example work properly because we will use the job category values in the two files to merge them together. -
SORT CASES
andSPLIT FILE
are used to divide the data into groups by job category (jobcat). TheLAYERED
keyword specifies that results for each split-file group should be displayed in the same table rather than in separate tables. - The
OMS
command will select all statistics tables from subsequentFREQUENCIES
commands and write the tables to a data file. - The
COLUMNS
subcommand will put the first layer dimension element and the second row dimension element in the columns. - The
FREQUENCIES
command produces a statistics table that contains the 25th, 50th, and 75th percentile values for salary. Since split-file processing is on, the table will contain separate percentile values for each job category. - In the statistics table, the variable salary is the only layer dimension element,
so the
L1
specification in theOMS COLUMNS
subcommand will put salary in the column dimension. - The table statistics are the second (inner) row dimension
element in the table, so the
R2
specification in theOMS COLUMNS
subcommand will put the statistics in the column dimension, nested under the variable salary. - The data values 1, 2, and 3 are used for the categories
of the variable jobcat instead
of the descriptive text value labels because of the previous
SET
command specifications.
The following table shows the contents of the resulting dataset.
Command_ | Subtype_ | Label_ | Var1 | salary_valid | salary_Missing | salary_25 | salary_50 | salary_75 |
---|---|---|---|---|---|---|---|---|
Frequencies | Statistics | Statistic | 1 | 363 | 0 | 22800 | 26550 | 31200 |
Frequencies | Statistics | Statistic | 2 | 27 | 0 | 30000 | 30750 | 31200 |
Frequencies | Statistics | Statistic | 3 | 84 | 0 | 51618 | 60500 | 72094 |
-
OMSEND
ends all activeOMS
commands. Without this, we could not access the dataset freq_table in the subsequentMATCH FILES
command because the file would still be open for writing. - The
MATCH FILES
command merges the contents of the dataset created from the statistics table with the original dataset. New variables from the data file created byOMS
will be added to the original data file. -
FILE=*
specifies the current active dataset, which is still the original data file. -
TABLE='freq_table'
identifies the dataset created byOMS
as a table lookup file. A table lookup file is a file in which data for each "case" can be applied to multiple cases in the other data file(s). In this example, the table lookup file contains only three cases—one for each job category. - In the dataset created by
OMS
, the variable that contains the job category values is named Var1, but in the original data file, the variable is named jobcat.RENAME (Var1=jobcat)
compensates for this discrepancy in the variable names. -
BY jobcat
merges the two files together by values of the variable jobcat. The three cases in the table lookup file will be merged with every case in the original data file with the same value for jobcat (also known as Var1 in the table lookup file). - Since we don't want to include the three table identifier
variables (automatically included in every data file created by
OMS
) or the two variables that contain information on valid and missing cases, we use theDROP
subcommand to omit these from the merged data file.
The end result is three new variables containing the 25th, 50th, and 75th percentile salary values for each job category.