Overview (AGGREGATE command)

AGGREGATE aggregates groups of cases in the active dataset into single cases and creates a new aggregated file or creates new variables in the active dataset that contain aggregated data. The values of one or more variables in the active dataset define the case groups. These variables are called break variables. A set of cases with identical values for each break variable is called a break group. If no break variables are specified, then the entire dataset is a single break group. Aggregate functions are applied to source variables in the active dataset to create new aggregated variables that have one value for each break group.

Options

Data. You can create new variables in the active dataset that contain aggregated data, replace the active dataset with aggregated results, or create a new data file that contains the aggregated results.

Documentary Text. You can copy documentary text from the original file into the aggregated file using the DOCUMENT subcommand. By default, documentary text is dropped.

Aggregated Variables. You can create aggregated variables using any of 19 aggregate functions. The functions SUM, MEAN, and SD can aggregate only numeric variables. All other functions can use both numeric and string variables.

Labels and Formats. You can specify variable labels for the aggregated variables. Variables created with the functions MAX, MIN, FIRST, and LAST assume the formats and value labels of their source variables. All other variables assume the default formats described under Aggregate Functions.

Basic Specification

The basic specification is at least one aggregate function and source variable. The aggregate function creates a new aggregated variable in the active dataset.

Subcommand Order

  • If specified, OUTFILE must be specified first.
  • If specified, DOCUMENT and PRESORTED must precede BREAK. No other subcommand can be specified between these two subcommands.
  • MISSING, if specified, must immediately follow OUTFILE.
  • The aggregate functions must be specified last.

Operations

  • When replacing the active dataset or creating a new data file, the aggregated file contains the break variables plus the variables created by the aggregate functions.
  • AGGREGATE excludes cases with missing values from all aggregate calculations except those involving the functions N, NU, NMISS, and NUMISS.
  • Unless otherwise specified, AGGREGATE sorts cases in the aggregated file in ascending order of the values of the grouping variables.
  • PRESORTED uses a faster, less memory-intensive algorithm that assumes the data are already sorted into the desired groups.
  • AGGREGATE ignores split-file processing. To achieve the same effect, name the variable or variables used to split the file as break variables before any other break variables. AGGREGATE produces one file, but the aggregated cases will then be in the same order as the split files.