Aggregate Functions (AGGREGATE command)
An aggregated variable is created by applying an aggregate function to a variable in the active dataset. The variable in the active dataset is called the source variable, and the new aggregated variable is the target variable.
- The aggregate functions must be specified last on
AGGREGATE
. - The simplest specification is a target variable list, followed by an equals sign, a function name, and a list of source variables.
- The number of target variables named must match the number of source variables.
- When several aggregate variables are defined at once, the first-named target variable is based on the first-named source variable, the second-named target is based on the second-named source, and so on.
- Only the functions
MAX
,MIN
,FIRST
, andLAST
copy complete dictionary information from the source variable. For all other functions, new variables do not have labels and are assigned default dictionary print and write formats. The default format for a variable depends on the function used to create it (see the list of available functions below). - You can provide a variable label for a new variable by specifying
the label in single or double quotes immediately following the new
variable name. Value labels cannot be assigned in
AGGREGATE
. - To change formats or add value labels to an active dataset created
by
AGGREGATE
, use thePRINT FORMATS
,WRITE FORMATS
,FORMATS
, orVALUE LABELS
command. If the aggregate file is written to disk, first retrieve the file usingGET
, specify the new labels and formats, and resave the file.
The following is a list of available functions:
SUM(varlist). Sum across cases.
MEAN(varlist). Mean across cases.
MEDIAN(varlist). Median across cases.
SD(varlist). Standard deviation across cases.
MAX(varlist). Maximum value across cases. Complete dictionary information is copied from the source variables to the target variables.
MIN(varlist). Minimum value across cases. Complete dictionary information is copied from the source variables to the target variables.
PGT(varlist,value). Percentage of cases greater than the specified value.
PLT(varlist,value). Percentage of cases less than the specified value.
PIN(varlist,value1,value2). Percentage of cases between value1 and value2, inclusive.
POUT(varlist,value1,value2). Percentage of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.
FGT(varlist,value). Fraction of cases greater than the specified value.
FLT(varlist,value). Fraction of cases less than the specified value.
FIN(varlist,value1,value2). Fraction of cases between value1 and value2, inclusive.
FOUT(varlist,value1,value2). Fraction of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.
CGT(varlist,value). Count of cases greater than the specified value.
CLT(varlist,value). Count of cases less than the specified value.
CIN(varlist,value1,value2). Count of cases between value1 and value2, inclusive.
COUT(varlist,value1,value2). Count of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.
N(varlist). Weighted number of cases in break group.
NU(varlist). Unweighted number of cases in break group.
NMISS(varlist). Weighted number of missing cases.
NUMISS(varlist). Unweighted number of missing cases.
FIRST(varlist). First nonmissing observed value in break group. Complete dictionary information is copied from the source variables to the target variables.
LAST(varlist). Last nonmissing observed value in break group. Complete dictionary information is copied from the source variables to the target variables.
- The functions
SUM
,MEAN
, andSD
can be applied only to numeric source variables. All other functions can use short and long string variables as well as numeric ones. - The
N
andNU
functions do not require arguments. Without arguments, they return the number of weighted and unweighted valid cases in a break group. If you supply a variable list, they return the number of weighted and unweighted valid cases for the variables specified. - For several functions, the argument includes values as well as a source variable designation. Either blanks or commas can be used to separate the components of an argument list.
- For percentage, fraction, and count within or outside a specified
range, the first value specified should be less than or equal to the
second. If not, they are automatically reversed. If the two values
are equal,
PIN
,FIN
, andCIN
calculate the percentage, fraction, or count equal to the argument.POUT
,FOUT
, andCOUT
calculate the percentage, fraction or count not equal to the argument. - String values specified in an argument should be enclosed in quotes.
Using the MEAN Function
AGGREGATE OUTFILE=’AGGEMP.SAV’ /BREAK=LOCATN
/AVGSAL 'Average Salary' AVGRAISE = MEAN(SALARY RAISE).
-
AGGREGATE
defines two aggregate variables, AVGSAL and AVGRAISE. - AVGSAL is the mean of SALARY for each break group, and AVGRAISE is the mean of RAISE.
- The label Average Salary is assigned to AVGSAL.
Using the PLT Function
AGGREGATE OUTFILE=* /BREAK=DEPT
/LOWVAC,LOWSICK = PLT (VACDAY SICKDAY,10).
-
AGGREGATE
creates two aggregated variables: LOWVAC and LOWSICK. LOWVAC is the percentage of cases with values less than 10 for VACDAY, and LOWSICK is the percentage of cases with values less than 10 for SICKDAY.
Using the FIN Function
AGGREGATE OUTFILE=’GROUPS.SAV’ /BREAK=OCCGROUP
/COLLEGE = FIN(EDUC,13,16).
-
AGGREGATE
creates the variable COLLEGE, which is the fraction of cases with 13 to 16 years of education (variable EDUC).
Using the PIN Function
AGGREGATE OUTFILE=* /BREAK=CLASS
/LOCAL = PIN(STATE,'IL','IO').
-
AGGREGATE
creates the variable LOCAL, which is the percentage of cases in each break group whose two-letter state code represents Illinois, Indiana, or Iowa. (The abbreviation for Indiana, IN, is between IL and IO in an alphabetical sort sequence.)