Nested Files

In a nested file, information on some records applies to several cases. The 1988 sales data are arranged in nested format in the figure below. The data contain three kinds of records. A code in the first column indicates whether a record is a year (Y), region (R), or person record (P).

Figure 1. File NESTED.DAT
Y   1988 
R   CHICAGO
P   JONES      900
P   GREGORY    400
R   BATON ROUGE 
P   RODRIGUEZ  300
P   SMITH      333
P   GRAU       100

The record types are related to each other hierarchically. Year records represent the highest level in the hierarchy, since the year value 1988 applies to each salesperson in the file (only one year record is used in this example). Region records are intermediate-level records; region names apply to salesperson records that occur before the next region record in the file. For example, Chicago applies to salespersons Jones and Gregory. Baton Rouge applies to Rodriguez, Smith, and Grau. Person records represent the lowest level in the hierarchy. The information they contain—salesperson and unit sales—defines a case. Nested file structures minimize redundant information in a data file. For example, 1988 and Baton Rouge appear several times in the rectangular file, but only once in the nested file.

Since each record in the nested file has a code that indicates record type, you can use the FILE TYPE and RECORD TYPE commands to define the nested sales data:

FILE  TYPE  NESTED  FILE='NESTED.DAT' RECORD=#TYPE 1 (A)

RECORD TYPE 'Y'.
DATA LIST /  YEAR 5-8.

RECORD TYPE 'R'.   
DATA LIST  / REGION 5-15 (A).


RECORD TYPE 'P'.  
DATA LIST / SALESPER 5-15 (A) SALES 20-23

END FILE TYPE.
  • FILE TYPE indicates that data are in nested form in the file NESTED.DAT.
  • RECORD defines the record type variable as string variable #TYPE in column 1. #TYPE is defined as scratch variable so it won’t be saved in the active dataset.
  • One pair of RECORD TYPE and DATA LIST statements is specified for each record type in the file. The first pair of RECORD TYPE and DATA LIST statements defines the variable YEAR in columns 5 through 8 on every year record. The second pair defines the string variable REGION on region records. The final pair defines SALESPER and SALES on person records.
  • The order of RECORD TYPE statements defines the hierarchical relationship among the records. The first RECORD TYPE defines the highest-level record type. The next RECORD TYPE defines the next highest level, and so forth. The last RECORD TYPE defines a case in the active dataset.
  • END FILE TYPE signals the end of file definition.
  • In processing nested data, the program reads each record type you define. Information on the highest and intermediate-level records is spread to cases to which the information applies. The output from the LIST command is identical to that for the rectangular file.