Nested Files
In a nested file, information on some records applies to several cases. The 1988 sales data are arranged in nested format in the figure below. The data contain three kinds of records. A code in the first column indicates whether a record is a year (Y), region (R), or person record (P).
Y 1988
R CHICAGO
P JONES 900
P GREGORY 400
R BATON ROUGE
P RODRIGUEZ 300
P SMITH 333
P GRAU 100
The record types are related to each other hierarchically. Year records represent the highest level in the hierarchy, since the year value 1988 applies to each salesperson in the file (only one year record is used in this example). Region records are intermediate-level records; region names apply to salesperson records that occur before the next region record in the file. For example, Chicago applies to salespersons Jones and Gregory. Baton Rouge applies to Rodriguez, Smith, and Grau. Person records represent the lowest level in the hierarchy. The information they contain—salesperson and unit sales—defines a case. Nested file structures minimize redundant information in a data file. For example, 1988 and Baton Rouge appear several times in the rectangular file, but only once in the nested file.
Since each record in the nested file has a code that
indicates record type, you can use the FILE
TYPE
and RECORD TYPE
commands to define the nested sales data:
FILE TYPE NESTED FILE='NESTED.DAT' RECORD=#TYPE 1 (A)
RECORD TYPE 'Y'.
DATA LIST / YEAR 5-8.
RECORD TYPE 'R'.
DATA LIST / REGION 5-15 (A).
RECORD TYPE 'P'.
DATA LIST / SALESPER 5-15 (A) SALES 20-23
END FILE TYPE.
-
FILE TYPE
indicates that data are in nested form in the file NESTED.DAT. -
RECORD
defines the record type variable as string variable #TYPE in column 1. #TYPE is defined as scratch variable so it won’t be saved in the active dataset. - One pair of
RECORD TYPE
andDATA LIST
statements is specified for each record type in the file. The first pair ofRECORD TYPE
andDATA LIST
statements defines the variable YEAR in columns 5 through 8 on every year record. The second pair defines the string variable REGION on region records. The final pair defines SALESPER and SALES on person records. - The order of
RECORD TYPE
statements defines the hierarchical relationship among the records. The firstRECORD TYPE
defines the highest-level record type. The nextRECORD TYPE
defines the next highest level, and so forth. The lastRECORD TYPE
defines a case in the active dataset. -
END FILE TYPE
signals the end of file definition. - In processing nested data, the program reads each
record type you define. Information on the highest and intermediate-level
records is spread to cases to which the information applies. The output
from the
LIST
command is identical to that for the rectangular file.