Overview (SPLIT FILE command)

SPLIT FILE splits the active dataset into subgroups that can be analyzed separately. These subgroups are sets of adjacent cases in the file that have the same values for the specified split variables. Each value of each split variable is considered a break group, and cases within a break group must be grouped together in the active dataset. If they are not grouped together, the SORT CASES command must be used before SPLIT FILE to sort cases in the proper order.

Basic Specification

The basic specification is keyword BY followed by the variable or variables that define the split-file groups.

  • By default, the split-file groups are compared within the same table(s).
  • You can turn off split-file processing by using keyword OFF.

Syntax Rules

  • SPLIT FILE can specify both numeric and string split variables, including variables that are created by temporary transformations. SPLIT FILE cannot specify scratch or system variables.
  • SPLIT FILE is in effect for all procedures in a session unless you limit it with a TEMPORARY command, turn it off, or override it with a new SPLIT FILE or SORT CASES command.

Operations

  • SPLIT FILE takes effect as soon as it is encountered in the command sequence. Therefore, pay special attention to the position of SPLIT FILE among commands. See the topic Command Order for more information.
  • The file is processed sequentially. A change or break in values on any one of the split variables signals the end of one break group and the beginning of the next break group.
  • AGGREGATE ignores the SPLIT FILE command. To split files by using AGGREGATE, name the variables that are used to split the file as break variables ahead of any other break variables on AGGREGATE. AGGREGATE still produces one file, but the aggregated cases are in the same order as the split-file groups.
  • If SPLIT FILE is in effect when a procedure writes matrix materials, the program writes one set of matrix materials for every split group. If a procedure reads a file that contains multiple sets of matrix materials, the procedure automatically detects the presence of multiple sets.
  • If SPLIT FILE names any variable that was defined by the NUMERIC command, the program prints page headings that indicate the split-file grouping.

Limitations

  • SPLIT FILE can specify or imply up to eight variables.
  • Each eight bytes of a string variable counts as a variable toward the limit of eight variables. So a string variable with a defined width of greater than 64 bytes cannot be used as a split file variable.