IBM Support

How do I AGGREGATE in SPSS?

Question & Answer


Question

How do I AGGREGATE data in SPSS for Windows?

Answer

Aggregate Data aggregates groups of cases in the active dataset into single cases and creates a new, aggregated file or creates new variables in the active dataset that contain aggregated data. Cases are aggregated based on the value of one or more break (grouping) variables.

If you create a new, aggregated data file, the new data file contains one case for each group defined by the break variables. For example, if there is one break variable with two values, the new data file will contain only two cases.

If you add aggregate variables to the active dataset, the data file itself is not aggregated. Each case with the same value(s) of the break variable(s) receives the same values for the new aggregate variables. For example, if gender is the only break variable, all males would receive the same value for a new aggregate variable that represents average age.

Break Variable(s) are cases which are grouped together based on the values of the break variables. Each unique combination of break variable values defines a group. When creating a new, aggregated data file, all break variables are saved in the new file with their existing names and dictionary information. The break variable can be either numeric or string.

Aggregated Variables are source variables used with aggregate functions to create new aggregate variables. The aggregate variable name is followed by an optional variable label in quotes, the name of the aggregate function, and the source variable name in parentheses.

Below is some sample syntax for aggregate:

AGGREGATE
/OUTFILE={filename}
/PRESORTED
/BREAK={var1}
/{aggvar} = {Function}{varlist}

Aggregate functions include:

* SUM(varlist) Sum across cases.
* MEAN(varlist) Mean across cases.
* MEDIAN(varlist) Median across cases.
* SD(varlist) Standard deviation across cases.

* MAX(varlist) Maximum value across cases.
* MIN(varlist) Minimum value across cases.

* PGT(varlist,value) Percentage of cases greater than the specified value.
* PLT(varlist,value) Percentage of cases less than the specified value.
* PIN(varlist,value1,value2) Percentage of cases between value1 and value2, inclusive.
* POUT(varlist,value1,value2) Percentage of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.

* FGT(varlist,value) Fraction of cases greater than the specified value.
* FLT(varlist,value) Fraction of cases less than the specified value.
* FIN(varlist,value1,value2) Fraction of cases between value1 and value2, inclusive.
* FOUT(varlist,value1,value2) Fraction of cases not between value1 and value2. Cases where the source variable equals value1 or value2 are not counted.

* N(varlist) Weighted number of cases in break group.
* NU(varlist) Unweighted number of cases in break group.
* NMISS(varlist) Weighted number of missing cases.
* NUMISS(varlist) Unweighted number of missing cases.

* FIRST(varlist) First nonmissing observed value in break group.
* LAST(varlist) Last nonmissing observed value in break group.

----------------------------------------------------------------------------

Example:

AGGREGATE
/OUTFILE='C:\tmp2.sav'
/BREAK=id1(D) id2 (A)
/Y_mean = MEAN(Y) /Y_sd = SD(Y).

Will create a new file called TMP1.SAV consisting of one case for each combination of values in variables ID1 and ID2. The values will be sorted in descending order on variable ID1 and in ascending order on ID2. Two variables will also exist in this file containing the mean of Y and the standard deviation of Y for each aggregate group. In this case since we use the PRESORTED
option it is important that the file be sorted prior to the AGGREGATE command. One would use SORT CASES BY ID1(D) ID2(A) .

* Some common errors one can obtain from aggregate * .

Assume the working data file contains variables A B and C .

(1) Messages involving unknown variables * .

AGGREGATE OUTFILE * / BREAK = D / N = N .

>Error # 10934 in column 31. Text: D
>The AGGREGATE command specifies an unknown existing variable name.
>This command not executed.

AGGREGATE OUTFILE * / BREAK A / V = N(D).

>Error # 10934 in column 39. Text: D
>The AGGREGATE command specifies an unknown existing variable name.
>This command not executed.

>Error # 10921 in column 40. Text: )
>The number of existing variables does not match the number of new variables
>in a new variable definition on the AGGREGATE command.

*(2) Missing or incorrect aggregate functions *.

AGGREGATE OUTFILE * / BREAK = A .

>Error # 10917 in column 80. Text: (End of Command)
>SPSS expected the definition of a new variable on the AGGREGATE command but
>did not find a new variable name.
>This command not executed.

AGGREGATE OUTFILE * / BREAK A / X Y Z = MAX( B TO C).

>Error # 10921 in column 52. Text: )
>The number of existing variables does not match the number of new variables
>in a new variable definition on the AGGREGATE command.
>This command not executed.

AGGREGATE OUTFILE * / BREAK A / X Y = MAX( B TO C) SX SY = SD( B C ).

>Error # 10933 in column 53. Text: SX
>The definition of a new variable on the AGGREGATE command must be
>terminated by a slash.
>This command not executed.

AGGREGATE OUTFILE * / BREAK A / X Y = MX( B TO C) /SX SY = SD( B C ).

>Error # 10920 in column 40. Text: MX
>The definition of a new variable on the AGGREGATE command specifies an
>unknown or misspelled function name. The following are valid function
>names: SUM, SD, MIN, PLT, POUT, FLT, FOUT, NU, NUMISS, LAST, MEAN, MAX,
>PGT, PIN, FGT, FIN, N, NMISS, and FIRST.
>This command not executed.

(3) Resource messages and errors .

* Consider a case where one has many categories of the break variable
*.

AGGREGATE OUTFILE * / BREAK X / V1 TO V20 = MEAN(V1 TO V20) .

One might receive a message concerning available workspace and error 10963.

THERE IS WORKSPACE FOR ONLY 171 CASES IN THE AGGREGATED FILE.

>Error # 10963
>There is not enough memory for all the cases in the aggregated file. The
>aggregated file is missing some cases. Enlarge workspace, rerun procedure
>This command not executed.

* One may use the PRESORTED option to solve the problem *
AGGREGATE OUTFILE * / PRESORTED /BREAK X / V1 TO V20 = MEAN(V1 TO V20) .

--------------------------------------------------------------------------
Aggregate is most useful in obtaining summary measures for groups of cases to later use as variables for analysis. For example, it is possible to determine how far a certain value falls above the smallest value in a group or below the largest value in the group using the example below:

DATA LIST FREE / VALUE .
BEGIN DATA
2 1 3 4 5 6 1 3 6 5 3 6 7 2 1 5 3 7 5 3 7 6 2 1 5 3 6 7 5 2 1 6 7 3 5 6
7 1 2 5 3 6 7 1 2 5 3 6 7 5 6 7 6 7 5 1 2 3 5 2 6 7 1 5 3 6 7 2 1 5 3 6 7
2 1 5 3 7 6 2 1 5 3 7 6 1 2 5 3 7 2 1 5 3 5 2 1 3 5 7 6 5 7 6 2 7 6 5
END DATA .
COMPUTE GP = TRUNC (UNIFORM(2) ) .
SORT CASES BY GP .
LIST.
AGGREGATE OUTFILE 'MINMAX'
/ PRESORTED
/ BREAK = GP
/ MIN = MIN(VALUE)
/ MAX = MAX(VALUE)
/ MEAN = MEAN(VALUE)
/ SD = SD(VALUE)
/ N = N
/ P1 = PIN(VALUE,1,1)
/ F1 = FIN (VALUE,1,1)
/ FO1 = FOUT(VALUE,1,1).
MATCH FILES FILE =* / TABLE = 'MINMAX' / BY GP .
LIST

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

17287

Document Information

Modified date:
16 April 2020

UID

swg21476173