Overview (MATCH FILES command)
MATCH FILES
combines variables from 2 up to 50 IBM® SPSS® Statistics data files. MATCH FILES
can make parallel or nonparallel matches
between different files or perform table lookups. Parallel matches combine files sequentially
by case (they are sometimes referred to as sequential matches). Nonparallel
matches combine files according to the values of one or
more key variables. In a table lookup, MATCH
FILES
looks up variables in one file and transfers those
variables to a case file.
The files specified on MATCH
FILES
can be IBM SPSS Statistics data files or open datasets in the current session. The combined
file becomes the new active dataset.
In general, MATCH FILES
is used to combine files containing the same cases but different
variables. To combine files containing the same variables but different
cases, use ADD FILES
. To update
existing IBM SPSS Statistics data files, use UPDATE
.
Options
Variable Selection. You can specify which variables from each input file are included in the
new active dataset using the DROP
and KEEP
subcommands.
Variable Names. You can
rename variables in each input file before combining the files using
the RENAME
subcommand. This permits
you to combine variables that are the same but whose names differ
in different input files or to separate variables that are different
but have the same name.
Variable Flag. You can create a variable
that indicates whether a case came from a particular input file using IN
. You can use the FIRST
or LAST
subcommands
to create a variable that flags the first or last case of a group
of cases with the same value for the key variable.
Variable Map. You can request a map showing all variables in the new active dataset,
their order, and the input files from which they came using the MAP
subcommand.
Basic specification
The basic specification is two or more FILE
subcommands, each of which specifies
a file to be matched. In addition, BY
is required to specify the key variables for
nonparallel matches. Both BY
and TABLE
are required to match
table-lookup files.
- All variables from all input
files are included in the new active dataset unless
DROP
orKEEP
is specified.
Subcommand order
-
RENAME
andIN
must immediately follow theFILE
orTABLE
subcommand to which they apply. - Any
BY
,FIRST
,LAST
,KEEP
,DROP
, andMAP
subcommands must follow all of theTABLE
,FILE
,RENAME
, andIN
subcommands.
Syntax rules
-
RENAME
can be repeated after eachFILE
orTABLE
subcommand and applies only to variables in the file named on the immediately precedingFILE
orTABLE
. -
IN
can be used only for a nonparallel match or for a table lookup. (Thus,IN
can be used only ifBY
is specified.) -
BY
can be specified only once. However, multiple variables can be specified onBY
. WhenBY
is used, all files must be sorted in ascending order of the key variables named onBY
. -
MAP
can be repeated as often as desired.
Operations
-
MATCH FILES
reads all files named onFILE
orTABLE
and builds a new active dataset.- If the current active dataset is included
and is specified with an asterisk (
FILE=*
), the new merged dataset replaces the active dataset. If that dataset is a named dataset, the merged dataset retains that name. If the current active dataset is not included or is specified by name (for example,FILE=Dataset1
), a new unnamed, merged dataset is created, and it becomes the active dataset. For information on naming datasets, see DATASET NAME.
- If the current active dataset is included
and is specified with an asterisk (
- The new active dataset contains complete dictionary information from the input files, including
variable names, labels, print and write formats, and missing-value indicators. The new file also
contains the documents from each of the input files. See
DROP DOCUMENTS
for information on deleting documents. - Variables are copied in order from the first file specified, then from the second file specified, and so on.
- If the same variable name is used in more than one input file, data are taken from the file
specified first. Dictionary information is taken from the first file containing value labels,
missing values, or a variable label for the common variable. If the first file has no such
information,
MATCH FILES
checks the second file, and so on, seeking dictionary information. - All cases from all input files are included in the combined file. Cases that are absent from one of the input files will be assigned system-missing values for variables unique to that file.
-
BY
specifies that cases should be combined according to a common value on one or more key variables. All input files must be sorted in ascending order of the key variables. - If
BY
is not used, the program performs a parallel (sequential) match, combining the first case from each file, then the second case from each file, and so on, without regard to any identifying values that may be present. - If the active dataset is named as an input file, any
N
andSAMPLE
commands that have been specified are applied to that file before files are matched. - Filter conditions defined by the
FILTER
command are preserved. Filtered cases are included in the merged file. If you want to exclude cases, use theSELECT IF
command to specify selection criteria before runningMATCH FILES
.
Limitations
- Maximum 50 files can be combined on one
MATCH FILES
command. - Maximum one
BY
subcommand. However,BY
can specify multiple variables. - The
TEMPORARY
command cannot be in effect if the active dataset is used as an input file.
Examples
Example
MATCH FILES FILE='/data/part1.sav'
/FILE='/data/part2.sav'
/FILE=*.
-
MATCH FILES
combines three files (the active dataset and two external IBM SPSS Statistics data files) in a parallel match. Cases are combined according to their order in each file. - The new active dataset contains as many cases as are contained in the largest of the three input files.
Example
GET FILE='/examples/data/mydata.sav'.
SORT CASES BY ID.
DATASET NAME mydata.
GET DATA /TYPE=XLS
/FILE='/examples/data/excelfile.xls'.
SORT CASES BY ID.
DATASET NAME excelfile.
GET DATA /TYPE=ODBC /CONNECT=
'DSN=MS Access Database;DBQ=/examples/data/dm_demo.mdb;'+
'DriverId=25;FIL=MS Access;MaxBufferSize=2048;PageTimeout=5;'
/SQL='SELECT * FROM main'.
SORT CASES BY ID.
MATCH FILES
/FILE='mydata'
/FILE='excelfile'
/FILE=*
/BY ID.
- A data file in IBM SPSS Statistics format and assigned the dataset name mydata. Since it has been assigned a dataset name, it remains available for subsequent use even after other data sources have been opened.
- An Excel file is then read and assigned the dataset name excelfile. Like the IBM SPSS Statistics data file, since it has been assigned a dataset name, it remains available after other data sources have been opened.
- Then a table from a database is read. Since it is the most recently opened or activated dataset, it is the active dataset.
- The three
datasets are then merged together with
MATCH FILES
command, using the dataset names on theFILE
subcommands instead of file names. - An asterisk (*) is used to specify the active dataset, which is the database table in this example.
- The
files are merged together based on the value of the key variable ID, specified on the
BY
subcommand. - Since all the files being merged need to be sorted in the same order of the key variables,
SORT CASES
is performed on each dataset.