Overview (FILE TYPE-END FILE TYPE command)

The FILE TYPE-END FILE TYPE structure defines data for any one of the three types of complex raw data files: mixed files, which contain several types of records that define different types of cases; hierarchical or nested files, which contain several types of records with a defined relationship among the record types; or grouped files, which contain several records for each case with some records missing or duplicated. A fourth type of complex file, files with repeating groups of information, can be defined with the REPEATING DATA command.

FILE TYPE must be followed by at least one RECORD TYPE command and one DATA LIST command. Each pair of RECORD TYPE and DATA LIST commands defines one type of record in the data. END FILE TYPE signals the end of file definition.

Within the FILE TYPE structure, the lowest-level record in a nested file can be read with a REPEATING DATA command rather than a DATA LIST command. In addition, any record in a mixed file can be read with REPEATING DATA.

Basic Specification

The basic specification on FILE TYPE is one of the three file type keywords (MIXED, GROUPED, or NESTED) and the RECORD subcommand. RECORD names the record identification variable and specifies its column location. If keyword GROUPED is specified, the CASE subcommand is also required. CASE names the case identification variable and specifies its column location.

The FILE TYPE-END FILE TYPE structure must enclose at least one RECORD TYPE and one DATA LIST command. END FILE TYPE is required to signal the end of file definition.

  • RECORD TYPE specifies the values of the record type identifier (see RECORD TYPE).
  • DATA LIST defines variables for the record type specified on the preceding RECORD TYPE command (see DATA LIST).
  • Separate pairs of RECORD TYPE and DATA LIST commands must be used to define each different record type.

The resulting active dataset is always a rectangular file, regardless of the structure of the original data file.

Syntax Rules

  • For mixed files, if the record types have different variables or if they have the same variables recorded in different locations, separate RECORD TYPE and DATA LIST commands are required for each record type.
  • For mixed files, the same variable name can be used on different DATA LIST commands, since each record type defines a separate case.
  • For mixed files, if the same variable is defined for more than one record type, the format type and length of the variable should be the same on all DATA LIST commands. The program refers to the first DATA LIST command that defines a variable for the print and write formats to include in the dictionary of the active dataset.
  • For grouped and nested files, the variable names on each DATA LIST must be unique, since a case is built by combining all record types together into a single record.
  • For nested files, the order of the RECORD TYPE commands defines the hierarchical structure of the file. The first RECORD TYPE defines the highest-level record type, the next RECORD TYPE defines the next highest-level record, and so forth. The last RECORD TYPE command defines a case in the active dataset. By default, variables from higher-level records are spread to the lowest-level record.
  • For nested files, the SPREAD subcommand on RECORD TYPE can be used to spread the values in a record type only to the first case built from each record of that type. All other cases associated with that record are assigned the system-missing value for the variables defined on that type. See RECORD TYPE for more information.
  • String values specified on the RECORD TYPE command must be enclosed in quotes.

Operations

  • For mixed file types, the program skips all records that are not specified on one of the RECORD TYPE commands.
  • If different variables are defined for different record types in mixed files, the variables are assigned the system-missing value for those record types on which they are not defined.
  • For nested files, the first record in the file should be the type specified on the first RECORD TYPE command—the highest level of the hierarchy. If the first record in the file is not the highest-level type, the program skips all records until it encounters a record of the highest-level type. If MISSING or DUPLICATE has been specified, these records may produce warning messages but will not be used to build a case in the active dataset.
  • When defining complex files, you are effectively building an input program and can use only commands that are allowed in the input state.