Overview (ADD FILES command)

ADD FILES combines cases from 2 up to 50 open data sets or external IBM® SPSS® Statistics data files by concatenating or interleaving cases. When cases are concatenated, all cases from one file are added to the end of all cases from another file. When cases are interleaved, cases in the resulting file are ordered according to the values of one or more key variables.

The files specified on ADD FILES can be external IBM SPSS Statistics data files and/or currently open datasets. The combined file becomes the new active dataset.

In general, ADD FILES is used to combine files containing the same variables but different cases. To combine files containing the same cases but different variables, use MATCH FILES. To update existing IBM SPSS Statistics data files, use UPDATE.

Options

Variable Selection. You can specify which variables from each input file are included in the new active dataset using the DROP and KEEP subcommands.

Variable Names. You can rename variables in each input file before combining the files using the RENAME subcommand. This permits you to combine variables that are the same but whose names differ in different input files or to separate variables that are different but have the same name.

Variable Flag. You can create a variable that indicates whether a case came from a particular input file using IN. When interleaving cases, you can use the FIRST or LAST subcommands to create a variable that flags the first or last case of a group of cases with the same value for the key variable.

Variable Map. You can request a map showing all variables in the new active dataset, their order, and the input files from which they came using the MAP subcommand.

Basic Specification

  • The basic specification is two or more FILE subcommands, each of which specifies a file to be combined. If cases are to be interleaved, the BY subcommand specifying the key variables is also required.
  • All variables from all input files are included in the new active dataset unless DROP or KEEP is specified.

Subcommand Order

  • RENAME and IN must immediately follow the FILE subcommand to which they apply.
  • BY, FIRST, and LAST must follow all FILE subcommands and their associated RENAME and IN subcommands.

Syntax Rules

  • RENAME can be repeated after each FILE subcommand. RENAME applies only to variables in the file named on the FILE subcommand immediately preceding it.
  • BY can be specified only once. However, multiple key variables can be specified on BY. When BY is used, all files must be sorted in ascending order by the key variables (see SORT CASES).
  • FIRST and LAST can be used only when files are interleaved (when BY is used).
  • MAP can be repeated as often as desired.

Operations

  • ADD FILES reads all input files named on FILE and builds a new active dataset. ADD FILES is executed when the data are read by one of the procedure commands or the EXECUTE, SAVE, or SORT CASES commands.
    • If the current active dataset is included and is specified with an asterisk (FILE=*), the new merged dataset replaces the active dataset. If that dataset is a named dataset, the merged dataset retains that name. If the current active dataset is not included or is specified by name (for example, FILE=Dataset1), a new unnamed, merged dataset is created, and it becomes the active dataset. For information on naming datasets, see DATASET NAME.
  • The resulting file contains complete dictionary information from the input files, including variable names, labels, print and write formats, and missing-value indicators. It also contains the documents from each input file. See DROP DOCUMENTS for information on deleting documents.
  • For each variable, dictionary information is taken from the first file containing value labels, missing values, or a variable label for the common variable. If the first file has no such information, ADD FILES checks the second file, and so on, seeking dictionary information.
  • Variables are copied in order from the first file specified, then from the second file specified, and so on. Variables that are not contained in all files receive the system-missing value for cases that do not have values for those variables.
  • If the same variable name exists in more than one file but the format type (numeric or string) does not match, the command is not executed.
  • If a numeric variable has the same name but different formats (for example, F8.0 and F8.2) in different input files, the format of the variable in the first-named file is used.
  • If a string variable has the same name but different formats (for example, A24 and A16) in different input files, the command is not executed.
  • If the active dataset is named as an input file, any N and SAMPLE commands that have been specified are applied to the active dataset before the files are combined.
  • If only one of the files is weighted, the program turns weighting off when combining cases from the two files. To weight the cases, use the WEIGHT command again.

Limitations

  • A maximum of 50 files can be combined on one ADD FILES command.
  • The TEMPORARY command cannot be in effect if the active dataset is used as an input file.