Overview (ADD FILES command)
ADD FILES combines cases from 2 up to 50 open data sets or external IBM® SPSS® Statistics data files by concatenating or interleaving
cases. When cases are concatenated, all cases from one file are added to the end of all cases from
another file. When cases are interleaved, cases in the resulting file are ordered according to the values
of one or more key variables.
The files specified on ADD
FILES can be external IBM SPSS Statistics data files and/or currently open
datasets. The combined file becomes the new active dataset.
In general, ADD FILES is used to combine files containing the same variables but different
cases. To combine files containing the same cases but different variables,
use MATCH FILES. To update existing IBM SPSS Statistics data files, use UPDATE.
Options
Variable Selection. You can specify which
variables from each input file are included in the new active dataset
using the DROP and KEEP subcommands.
Variable Names. You can
rename variables in each input file before combining the files using
the RENAME subcommand. This permits
you to combine variables that are the same but whose names differ
in different input files or to separate variables that are different
but have the same name.
Variable Flag. You can create a variable
that indicates whether a case came from a particular input file using IN. When interleaving cases, you can use
the FIRST or LAST subcommands to create a variable that
flags the first or last case of a group of cases with the same value
for the key variable.
Variable Map. You can request a map showing
all variables in the new active dataset, their order, and the input
files from which they came using the MAP subcommand.
Basic Specification
- The basic
specification is two or more
FILEsubcommands, each of which specifies a file to be combined. If cases are to be interleaved, theBYsubcommand specifying the key variables is also required. - All variables from all input files are included in the
new active dataset unless
DROPorKEEPis specified.
Subcommand Order
-
RENAMEandINmust immediately follow theFILEsubcommand to which they apply. -
BY,FIRST, andLASTmust follow allFILEsubcommands and their associatedRENAMEandINsubcommands.
Syntax Rules
-
RENAMEcan be repeated after eachFILEsubcommand.RENAMEapplies only to variables in the file named on theFILEsubcommand immediately preceding it. -
BYcan be specified only once. However, multiple key variables can be specified onBY. WhenBYis used, all files must be sorted in ascending order by the key variables (seeSORT CASES). -
FIRSTandLASTcan be used only when files are interleaved (whenBYis used). -
MAPcan be repeated as often as desired.
Operations
-
ADD FILESreads all input files named onFILEand builds a new active dataset.ADD FILESis executed when the data are read by one of the procedure commands or theEXECUTE,SAVE, orSORT CASEScommands.- If the current active dataset is included
and is specified with an asterisk (
FILE=*), the new merged dataset replaces the active dataset. If that dataset is a named dataset, the merged dataset retains that name. If the current active dataset is not included or is specified by name (for example,FILE=Dataset1), a new unnamed, merged dataset is created, and it becomes the active dataset. For information on naming datasets, see DATASET NAME.
- If the current active dataset is included
and is specified with an asterisk (
- The resulting file contains complete dictionary
information from the input files, including variable names, labels,
print and write formats, and missing-value indicators. It also contains
the documents from each input file. See
DROP DOCUMENTSfor information on deleting documents. - For each variable, dictionary information is taken from
the first file containing value labels, missing values, or a variable
label for the common variable. If the first file has no such information,
ADD FILESchecks the second file, and so on, seeking dictionary information. - Variables are copied in order from the first file specified, then from the second file specified, and so on. Variables that are not contained in all files receive the system-missing value for cases that do not have values for those variables.
- If the same variable name exists in more than one file but the format type (numeric or string) does not match, the command is not executed.
- If a numeric variable has the same name but different
formats (for example,
F8.0andF8.2) in different input files, the format of the variable in the first-named file is used. - If a string variable has the same name but different formats (for
example,
A24andA16) in different input files, the command is not executed. - If the active dataset is named
as an input file, any
NandSAMPLEcommands that have been specified are applied to the active dataset before the files are combined. - If only one of the files is weighted, the program
turns weighting off when combining cases from the two files. To weight
the cases, use the
WEIGHTcommand again.
Limitations
- A maximum
of 50 files can be combined on one
ADD FILEScommand. - The
TEMPORARYcommand cannot be in effect if the active dataset is used as an input file.