Using FILE TYPE GROUPED
To use FILE TYPE GROUPED
to define a grouped file, each record must have a case identifier
and a record code. In the following commands, each data record contains
a student ID number coded 1, 2, or 3 and a code indicating whether
the score on that record is a reading (R), math (M), or science (S)
score:
FILE TYPE GROUPED RECORD=#REC 3(A) CASE=STUDENT 1.
RECORD TYPE 'R'.
DATA LIST / READING 5-6.
RECORD TYPE 'M'.
DATA LIST / MATH 5-6.
RECORD TYPE 'S'.
DATA LIST / SCIENCE 5-6.
END FILE TYPE.
BEGIN DATA
1 R 58
1 M 59
1 S 97
2 R 43
2 M 88
2 S 45
3 R 67
3 M 75
3 S 90
END DATA.
LIST.
-
FILE TYPE
indicates that data are in grouped format.RECORD
defines the variable containing record codes as string variable #REC in column 3.CASE
defines the case identifier variable STUDENT in the first column of each record. - One pair of
RECORD TYPE
andDATA LIST
statements appears for each record type in the file. The program reads reading score in every R record, math score in M records, and science score in S records. -
END FILE TYPE
signals the end of file definition. -
BEGIN DATA
andEND DATA
indicate that data are inline. - The output from
LIST
is identical to the output using DATA LIST.
FILE TYPE GROUPED
is most useful when record order varies across cases and when cases
have missing or duplicate records. In the modified data shown below,
only case 1 has all three record types. Also, record order varies
across cases. For example, the first record for case 1 is a science
record, whereas the first record for cases 2 and 3 is a reading record.
Student | Subject | Score |
---|---|---|
1 | S | 97 |
1 | R | 58 |
1 | M | 59 |
2 | R | 43 |
3 | R | 67 |
3 | M | 75 |
You can use the same FILE TYPE
commands as above to read the modified file.
As shown in the output from LIST
below, the program assigns missing values to variables that are
missing for a case.
STUDENT READING MATH SCIENCE
1 58 59 97
2 43 . .
3 67 75 .
By default, the program generates a warning message
when a case is missing a defined record type in a grouped file or
when a record is not in the same order as in RECORD TYPE
commands. Thus, four warnings are generated
when the commands for the previous example are used to read the modified GROUPED.DAT file. You can suppress these warnings
if you add the optional specifications MISSING=NOWARN
and ORDERED=NO
on your FILE TYPE
command.
In the modified GROUPED.DAT file, the case identifier STUDENT appears in the same column position in each record. When the location
of the case identifier varies for different types of records, you
can use the CASE
option of the RECORD TYPE
command to specify different
column positions for different records. For example, suppose the case
identifier appears in first column position on reading and science
records and in column 2 in math records. You could use the following
commands to define the data:
FILE TYPE GROUPED RECORD=#REC 3(A) CASE=STUDENT 1.
RECORD TYPE 'R'.
DATA LIST / READING 5-6.
RECORD TYPE 'M' CASE=2.
DATA LIST / MATH 5-6.
RECORD TYPE 'S'.
DATA LIST / SCIENCE 5-6.
END FILE TYPE.
BEGIN DATA
1 S 97
1 R 58
1M 59
2 R 43
3 R 67
3M 75
END DATA.
LIST.
-
FILE TYPE
indicates that the data are in grouped format.RECORD
defines the variable containing record codes as string variable #REC.CASE
defines the case identifier variable as STUDENT in the first column of each record. - One pair of
RECORD TYPE
andDATA LIST
statements is coded for each record type in the file. - The
CASE
specification on theRECORD TYPE
statement for math records overrides theCASE
value defined onFILE TYPE
. Thus, the program reads STUDENT in column 2 in math records and column 1 in other records. -
END FILE TYPE
signals the end of file definition. -
BEGIN DATA
andEND DATA
indicate that data are inline. - The output from
LIST
is identical to the output above.