Example 1: Automating a File-Matching Task

The following figure shows a listing of 1988 sales data for salespeople working in different regions. The listing shows that salesperson Jones sold 900 units in the Chicago sales territory, while Rodriguez sold 300 units in Baton Rouge.

Figure 1. Listing of data file SALES88.SAV
YEAR REGION      SALESPER  SALES

1988 CHICAGO     JONES      900
1988 CHICAGO     GREGORY    400
1988 BATON ROUGE RODRIGUEZ  300
1988 BATON ROUGE SMITH      333
1988 BATON ROUGE GRAU       100

You can use command syntax shown below to obtain each salesperson’s percentage of total sales for their region.

Figure 2. Commands for obtaining sales percentages
GET FILE = ’SALES88.SAV’.

SORT CASES BY REGION.

AGGREGATE OUTFILE = ’TOTALS.SAV’
  /PRESORTED
  /BREAK = REGION
  /TOTAL@ = SUM(SALES).

MATCH FILES FILE=*
  /TABLE = ’TOTALS.SAV’
  /BY REGION.

COMPUTE PCT = 100 * SALES / TOTAL@.

TITLE 1988 DATA.
LIST.
  • The GET command opens SALES88.SAV. This file becomes the active dataset.
  • SORT CASES sorts the active dataset in ascending alphabetical order by REGION.
  • The AGGREGATE command saves total sales (variable TOTAL@) for each region in file TOTALS.SAV.
  • MATCH FILES appends the regional totals to each salesperson’s record in the active dataset. (See the MATCH FILES command for more information on matching files.)
  • COMPUTE obtains the percentage of regional sales (PCT) for each salesperson.
  • The LIST command output displayed below shows that Rodriguez sold 41% of the products sold in Baton Rouge. Gregory accounted for 31% of sales in the Chicago area.
Figure 3. Listing of data file SALES88.SAV
YEAR REGION      SALESPER  SALES  TOTAL@     PCT

1988 BATON ROUGE RODRIGUEZ  300   733.00   41.00
1988 BATON ROUGE SMITH      333   733.00   45.00
1988 BATON ROUGE GRAU       100   733.00   14.00
1988 CHICAGO     JONES      900  1300.00   69.00
1988 CHICAGO     GREGORY    400  1300.00   69.00

The following figure shows a macro that issues the commands for obtaining sales percentages. The macro consists of the commands that produce sales percentages imbedded between macro definition commands DEFINE and !ENDDEFINE.

Figure 4. !TOTMAC macro
DEFINE !TOTMAC ().

GET FILE = ’SALES88.SAV’.

SORT CASES BY REGION.

AGGREGATE OUTFILE = ’TOTALS.SAV’
  /PRESORTED
  /BREAK = REGION
  /TOTAL@ = SUM(SALES).

MATCH FILES FILE = *
  /TABLE = ’TOTALS.SAV’
  /BY REGION.

COMPUTE PCT = 100 * SALES / TOTAL@.

TITLE 1988 DATA.
LIST.

!ENDDEFINE.

!TOTMAC.
  • Macro definition commands DEFINE and !ENDDEFINE signal the beginning and end of macro processing. DEFINE also assigns the name !TOTMAC to the macro (the parentheses following the name of the macro are required). The macro name begins with an exclamation point so that the macro does not conflict with that of an existing variable or command. Otherwise, if the macro name matched a variable name, the variable name would invoke the macro whenever the variable name appeared in the command stream.
  • Commands between DEFINE and !ENDDEFINE constitute the macro body. These commands produce sales percentages.
  • The final statement, !TOTMAC, is the macro call, which invokes the macro. When the program reads the macro call, it issues the commands in the macro body. Then these commands are executed, generating identical output.

While the macro shows you how to construct a simple macro, it doesn’t reduce the number of commands needed to calculate regional percentages. However, you can use macro features such as looping to minimize coding in more complicated tasks. For example, let’s say that in addition to the 1988 data, you have sales data for 1989 (SALES89.SAV), and each file contains the variables REGION, SALESPER, and SALES. The modified !TOTMAC macro below calculates regional sales percentages for each salesperson for 1988 and 1989.

Figure 5. !TOTMAC macro with index loop
DEFINE !TOTMAC ().

!DO !I = 88 !TO 89.

- GET FILE = !CONCAT(’SALES’, !I, ’.SAV’).
- SORT CASES BY REGION.
- AGGREGATE OUTFILE = ’TOTALS.SAV’
      /PRESORTED
      /BREAK = REGION
      /TOTAL@ = SUM(SALES).
- MATCH FILES FILE = *
      /TABLE = ’TOTALS.SAV’
      /BY REGION.
- COMPUTE PCT= 100 * SALES / TOTAL@.
- !LET !YEAR = !CONCAT(’19’,!I).
- TITLE !YEAR DATA.
- LIST.
!DOEND.

!ENDDEFINE.

!TOTMAC.
  • DEFINE and !ENDDEFINE signal the beginning and end of macro processing.
  • Commands !DO and !DOEND define an index loop. Commands between !DO and !DOEND are issued once in each iteration of the loop. The value of index variable !I, which changes in each iteration, is 88 in the first iteration and 89 in the second (final) iteration.
  • In each iteration of the loop, the GET command opens a data file. The name of the file is constructed using the string manipulation function !CONCAT, which creates a string that is the concatenation of SALES, the value of the index variable, and .sav. Thus the file SALES88.SAV is opened in the first iteration.
  • Commands between AGGREGATE and COMPUTE calculate percentages on the active dataset.
  • Next, a customized title is created. In the first iteration, the direct assignment command !LET assigns a value of 1988 to the macro variable !YEAR. This variable is used in the TITLE command on the following line to specify a title of 1988 DATA.
  • The LIST command displays the contents of each variable.
  • In the second iteration of the loop, commands display percentages for the 1989 data file. The output from the !TOTMAC macro is shown below. Note that the listing for 1988 data is the same as before.
Figure 6. Regional sales percentages for 1988 and 1989
1988 DATA

YEAR REGION      SALESPER  SALES  TOTAL@     PCT

1988 BATON ROUGE RODRIGUEZ  300   733.00   41.00
1988 BATON ROUGE SMITH      333   733.00   45.00
1988 BATON ROUGE GRAU       100   733.00   14.00
1988 CHICAGO     JONES      900  1300.00   69.00
1988 CHICAGO     GREGORY    400  1300.00   69.00


1989 DATA
YEAR REGION      SALESPER SALES   TOTAL@     PCT
1989 BATON ROUGE GRAU      320   1459.00   22.00
1989 BATON ROUGE SMITH     800   1459.00   55.00
1989 BATON ROUGE RODRIGUEZ 339   1459.00   23.00
1989 CHICAGO     JONES     300   1439.00   21.00
1989 CHICAGO     STEEL     899   1439.00   62.00
1989 CHICAGO     GREGORY   240   1439.00   17.00

Let’s look at another application of the !TOTMAC macro, one that uses keyword arguments to make the application more flexible. The following figure shows the number of absences for students in two classrooms. Let’s say you want to calculate deviation scores indicating how many more (or fewer) times a student was absent than the average student in his or her classroom. The first step in obtaining deviation scores is to compute the average number of absences per classroom. We can use the !TOTMAC macro to compute classroom means by modifying the macro so that it computes means and uses the absences data file (SCHOOL.SAV) as input.

Figure 7. Listing of file SCHOOL.SAV
CLASS STUDENT ABSENT
 101  BARRY G    3
 101  JENNI W    1
 101  ED F       2
 101  JOHN 0     8
 102  PAUL Y     2
 102  AMY G      3
 102  JOHN D    12
 102  RICH H     4

The !TOTMAC macro below can produce a variety of group summary statistics such as sum, mean, and standard deviation for any IBM® SPSS® Statistics data file. In the macro call you specify values of keyword arguments indicating the data file (FILE), the break (grouping) variable (BREAKVR), the summary function (FUNC), and the variable to be used as input to the summary function (INVAR). For example, to obtain mean absences for each classroom, we specify SCHOOL.SAV as the data file, CLASS as the break variable, MEAN as the summary function, and ABSENT as the variable whose values are to be averaged.

Figure 8. !TOTMAC macro with keyword arguments
DEFINE !TOTMAC ( BREAKVR = !TOKENS(1)
                /FUNC = !TOKENS(1)
                /INVAR = !TOKENS(1)
                /TEMP = !TOKENS(1) !DEFAULT(TOTALS.SAV)
                /FILE = !CMDEND).
GET FILE = !FILE.
SORT CASES BY !BREAKVR.
AGGREGATE OUTFILE = ’!TEMP’
   /PRESORTED
   /BREAK = !BREAKVR
   /!CONCAT(!FUNC,’@’) = !FUNC(!INVAR).

MATCH FILES FILE = *
   /TABLE = ’!TEMP’
   /BY !BREAKVR.

!ENDDEFINE.

!TOTMAC BREAKVR=CLASS FUNC=MEAN INVAR=ABSENT FILE=SCHOOL.SAV.

COMPUTE DIFF = ABSENT-MEAN@.

LIST.

!TOTMAC BREAKVR=REGION FUNC=SUM INVAR=SALES FILE=SALES89.SAV.

COMPUTE PCT = 100 * SALES / SUM@.

LIST.
  • The syntax for declaring keyword arguments follows the name of the macro in DEFINE.
  • !TOKENS(1) specifies that the value of an argument is a string following the name of the argument in the macro call. Thus the first macro call specifies CLASS as the value of BREAKVR, MEAN as the value of FUNC, and ABSENT as the value of INVAR.
  • !CMDEND indicates that the value for FILE is the remaining text in the macro call (SCHOOL.SAV).
  • TEMP is an optional argument that names an intermediate file to contain the summary statistics. Since TEMP is not assigned a value in the macro call, summary statistics are written to the default intermediate file (TOTALS.SAV).
  • In the body of the macro, GET FILE opens SCHOOL.SAV.
  • SORT CASES sorts the file by CLASS.
  • AGGREGATE computes the mean number of absences for each class. The name of the variable containing the means (MEAN@) is constructed using the !CONCAT function, which concatenates the value of FUNC and the @ symbol.
  • MATCH FILES appends the means to student records.
  • COMPUTE calculates the deviation from the classroom mean for each student (variable DIFF).
  • LIST displays the deviation scores, as shown in the output below. For example, John D., who was absent 12 times, had 6.75 more absences than the average student in classroom 102. Rich H., who was absent 4 times, had 1.25 fewer absences than the average student in classroom 102.
  • The second macro call and remaining commands generate regional sales percentages for the 1989 sales data.
Figure 9. Student absences and 1989 sales percentages
CLASS STUDENT ABSENT MEAN@    DIFF
 101  BARRY G    3    3.50    -.50
 101  JENNI W    1    3.50   -2.50
 101  ED F       2    3.50   -1.50
 101  JOHN 0     8    3.50    4.50
 102  PAUL Y     2    5.25   -3.25
 102  AMY G      3    5.25   -2.25
 102  JOHN D    12    5.25    6.75
 102  RICH H     4    5.25   -1.25


1989 DATA
YEAR REGION      SALESPER SALES   TOTAL@     PCT
1989 BATON ROUGE GRAU      320   1459.00   22.00
1989 BATON ROUGE SMITH     800   1459.00   55.00
1989 BATON ROUGE RODRIGUEZ 339   1459.00   23.00
1989 CHICAGO     JONES     300   1439.00   21.00
1989 CHICAGO     STEEL     899   1439.00   62.00
1989 CHICAGO     GREGORY   240   1439.00   17.00

You can modify the macro call to specify a different data file, input variable, break variable, or summary statistic. To get a different summary statistic (such as standard deviation), change the value of FUNC (see the AGGREGATE command for more information on summary functions available in the AGGREGATE procedure).