Using FILE TYPE GROUPED

To use FILE TYPE GROUPED to define a grouped file, each record must have a case identifier and a record code. In the following commands, each data record contains a student ID number coded 1, 2, or 3 and a code indicating whether the score on that record is a reading (R), math (M), or science (S) score:

FILE TYPE GROUPED RECORD=#REC 3(A)  CASE=STUDENT 1.

RECORD TYPE 'R'.
DATA LIST / READING 5-6.

RECORD TYPE 'M'.
DATA LIST / MATH 5-6.

RECORD TYPE 'S'. 
DATA LIST / SCIENCE 5-6.

END FILE TYPE.

BEGIN DATA
1 R 58
1 M 59
1 S 97
2 R 43
2 M 88
2 S 45
3 R 67
3 M 75
3 S 90
END DATA.

LIST.
  • FILE TYPE indicates that data are in grouped format. RECORD defines the variable containing record codes as string variable #REC in column 3. CASE defines the case identifier variable STUDENT in the first column of each record.
  • One pair of RECORD TYPE and DATA LIST statements appears for each record type in the file. The program reads reading score in every R record, math score in M records, and science score in S records.
  • END FILE TYPE signals the end of file definition.
  • BEGIN DATA and END DATA indicate that data are inline.
  • The output from LIST is identical to the output using DATA LIST.

FILE TYPE GROUPED is most useful when record order varies across cases and when cases have missing or duplicate records. In the modified data shown below, only case 1 has all three record types. Also, record order varies across cases. For example, the first record for case 1 is a science record, whereas the first record for cases 2 and 3 is a reading record.

Table 1. Modified grouped data file
Student Subject Score
1 S 97
1 R 58
1 M 59
2 R 43
3 R 67
3 M 75

You can use the same FILE TYPE commands as above to read the modified file. As shown in the output from LIST below, the program assigns missing values to variables that are missing for a case.

Figure 1. LIST output for GROUPED.DAT
STUDENT READING MATH SCIENCE

   1       58    59     97
   2       43     .      .
   3       67    75      .

By default, the program generates a warning message when a case is missing a defined record type in a grouped file or when a record is not in the same order as in RECORD TYPE commands. Thus, four warnings are generated when the commands for the previous example are used to read the modified GROUPED.DAT file. You can suppress these warnings if you add the optional specifications MISSING=NOWARN and ORDERED=NO on your FILE TYPE command.

In the modified GROUPED.DAT file, the case identifier STUDENT appears in the same column position in each record. When the location of the case identifier varies for different types of records, you can use the CASE option of the RECORD TYPE command to specify different column positions for different records. For example, suppose the case identifier appears in first column position on reading and science records and in column 2 in math records. You could use the following commands to define the data:

FILE TYPE GROUPED RECORD=#REC 3(A)  CASE=STUDENT 1.

RECORD TYPE 'R'.
DATA LIST / READING 5-6.

RECORD TYPE 'M' CASE=2.
DATA LIST / MATH 5-6. 

RECORD TYPE 'S'. 
DATA LIST / SCIENCE 5-6.

END FILE TYPE.

BEGIN DATA 
1 S 97
1 R 58
 1M 59
2 R 43
3 R 67
 3M 75
END DATA.

LIST.
  • FILE TYPE indicates that the data are in grouped format. RECORD defines the variable containing record codes as string variable #REC. CASE defines the case identifier variable as STUDENT in the first column of each record.
  • One pair of RECORD TYPE and DATA LIST statements is coded for each record type in the file.
  • The CASE specification on the RECORD TYPE statement for math records overrides the CASE value defined on FILE TYPE. Thus, the program reads STUDENT in column 2 in math records and column 1 in other records.
  • END FILE TYPE signals the end of file definition.
  • BEGIN DATA and END DATA indicate that data are inline.
  • The output from LIST is identical to the output above.