Aggregate Functions (AGGREGATE command)

An aggregated variable is created by applying an aggregate function to a variable in the active dataset. The variable in the active dataset is called the source variable, and the new aggregated variable is the target variable.

  • The aggregate functions must be specified last on AGGREGATE.
  • The simplest specification is a target variable list, followed by an equals sign, a function name, and a list of source variables.
  • The number of target variables named must match the number of source variables.
  • When several aggregate variables are defined at once, the first-named target variable is based on the first-named source variable, the second-named target is based on the second-named source, and so on.
  • Only the functions MAX, MIN, FIRST, and LAST copy complete dictionary information from the source variable. For all other functions, new variables do not have labels and are assigned default dictionary print and write formats. The default format for a variable depends on the function used to create it (see the list of available functions below).
  • You can provide a variable label for a new variable by specifying the label in single or double quotes immediately following the new variable name. Value labels cannot be assigned in AGGREGATE.
  • To change formats or add value labels to an active dataset created by AGGREGATE, use the PRINT FORMATS, WRITE FORMATS, FORMATS, or VALUE LABELS command. If the aggregate file is written to disk, first retrieve the file using GET, specify the new labels and formats, and resave the file.

The following is a list of available functions:

SUM(varlist). Sum across cases.

MEAN(varlist). Mean across cases.

MEDIAN(varlist). Median across cases.

SD(varlist). Standard deviation across cases.

MAX(varlist). Maximum value across cases. Complete dictionary information is copied from the source variables to the target variables.

MIN(varlist). Minimum value across cases. Complete dictionary information is copied from the source variables to the target variables.

PGT(varlist,value). Percentage of cases greater than the specified value.

PLT(varlist,value). Percentage of cases less than the specified value.

PIN(varlist,value1,value2). Percentage of cases between value1 and value2, inclusive.

POUT(varlist,value1,value2). Percentage of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.

FGT(varlist,value). Fraction of cases greater than the specified value.

FLT(varlist,value). Fraction of cases less than the specified value.

FIN(varlist,value1,value2). Fraction of cases between value1 and value2, inclusive.

FOUT(varlist,value1,value2). Fraction of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.

CGT(varlist,value). Count of cases greater than the specified value.

CLT(varlist,value). Count of cases less than the specified value.

CIN(varlist,value1,value2). Count of cases between value1 and value2, inclusive.

COUT(varlist,value1,value2). Count of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.

N(varlist). Weighted number of cases in break group.

NU(varlist). Unweighted number of cases in break group.

NMISS(varlist). Weighted number of missing cases.

NUMISS(varlist). Unweighted number of missing cases.

FIRST(varlist). First nonmissing observed value in break group. Complete dictionary information is copied from the source variables to the target variables.

LAST(varlist). Last nonmissing observed value in break group. Complete dictionary information is copied from the source variables to the target variables.

  • The functions SUM, MEAN, and SD can be applied only to numeric source variables. All other functions can use short and long string variables as well as numeric ones.
  • The N and NU functions do not require arguments. Without arguments, they return the number of weighted and unweighted valid cases in a break group. If you supply a variable list, they return the number of weighted and unweighted valid cases for the variables specified.
  • For several functions, the argument includes values as well as a source variable designation. Either blanks or commas can be used to separate the components of an argument list.
  • For percentage, fraction, and count within or outside a specified range, the first value specified should be less than or equal to the second. If not, they are automatically reversed. If the two values are equal, PIN, FIN, and CIN calculate the percentage, fraction, or count equal to the argument. POUT, FOUT, and COUT calculate the percentage, fraction or count not equal to the argument.
  • String values specified in an argument should be enclosed in quotes.

Using the MEAN Function

AGGREGATE OUTFILE=’AGGEMP.SAV’ /BREAK=LOCATN
 /AVGSAL 'Average Salary' AVGRAISE = MEAN(SALARY RAISE).
  • AGGREGATE defines two aggregate variables, AVGSAL and AVGRAISE.
  • AVGSAL is the mean of SALARY for each break group, and AVGRAISE is the mean of RAISE.
  • The label Average Salary is assigned to AVGSAL.

Using the PLT Function

AGGREGATE OUTFILE=* /BREAK=DEPT
 /LOWVAC,LOWSICK = PLT (VACDAY SICKDAY,10).
  • AGGREGATE creates two aggregated variables: LOWVAC and LOWSICK. LOWVAC is the percentage of cases with values less than 10 for VACDAY, and LOWSICK is the percentage of cases with values less than 10 for SICKDAY.

Using the FIN Function

AGGREGATE OUTFILE=’GROUPS.SAV’ /BREAK=OCCGROUP
 /COLLEGE = FIN(EDUC,13,16).
  • AGGREGATE creates the variable COLLEGE, which is the fraction of cases with 13 to 16 years of education (variable EDUC).

Using the PIN Function

AGGREGATE OUTFILE=* /BREAK=CLASS
 /LOCAL = PIN(STATE,'IL','IO').
  • AGGREGATE creates the variable LOCAL, which is the percentage of cases in each break group whose two-letter state code represents Illinois, Indiana, or Iowa. (The abbreviation for Indiana, IN, is between IL and IO in an alphabetical sort sequence.)