Overview (UPDATE command)
UPDATE
replaces values in a primary file with updated values recorded in one or
more files called transaction files. Cases in the primary file and transaction file are matched
according to a key variable.
The primary file and the transaction files must be IBM® SPSS® Statistics data files or
datasets available in the current session, including the active dataset. UPDATE
replaces values and creates a new active dataset, which replaces the original active dataset.
UPDATE
is designed
to update values of existing variables for existing cases. Use MATCH FILES
to add new variables to a data
file and ADD FILES
to add new
cases.
Options
Variable Selection. You can specify which
variables from each input file are included in the new active dataset
using the DROP
and KEEP
subcommands.
Variable Names. You can
rename variables in each input file before combining the files using
the RENAME
subcommand. This permits
you to combine variables that are the same but whose names differ
in different input files, or to separate variables that are different
but have the same name.
Variable Flag. You can create a variable
that indicates whether a case came from a particular input file using IN
. You can use the FIRST
or LAST
subcommand
to create a variable that flags the first or last case of a group
of cases with the same value for the key variable.
Variable Map. You can request a map showing all variables in the new active dataset,
their order, and the input files from which they came using the MAP
subcommand.
Basic Specification
The basic
specification is two or more FILE
subcommands and a BY
subcommand.
- The first
FILE
subcommand must specify the primary file. All otherFILE
subcommands identify the transaction files. -
BY
specifies the key variables. - All files must be sorted in ascending order by the key variables.
- By default, all variables from all input files are included in the new active dataset.
Subcommand Order
- The primary file must be specified first.
-
RENAME
andIN
must immediately follow theFILE
subcommand to which they apply. -
BY
must follow theFILE
subcommands and any associatedRENAME
andIN
subcommands. -
MAP
,DROP
, andKEEP
must be specified after allFILE
andRENAME
subcommands.
Syntax Rules
-
BY
can be specified only once. However, multiple variables can be specified onBY
. All files must be sorted in ascending order by the key variables named onBY
. - The primary file cannot contain duplicate values for the key variables.
-
RENAME
can be repeated after eachFILE
subcommand and applies only to variables in the file named on the immediately precedingFILE
subcommand. -
MAP
can be repeated as often as needed.
Operations
-
UPDATE
reads all input files named onFILE
and builds a new active dataset. The new active dataset is built when the data are read by one of the procedure commands or theEXECUTE
,SAVE
, orSORT CASES
command.- If the current active dataset is included
and is specified with an asterisk (
FILE=*
), the new merged dataset replaces the active dataset. If that dataset is a named dataset, the merged dataset retains that name. If the current active dataset is not included or is specified by name (for example,FILE=Dataset1
), a new unnamed, merged dataset is created, and it becomes the active dataset. For information on naming datasets, see DATASET NAME.
- If the current active dataset is included
and is specified with an asterisk (
- The new active dataset contains complete dictionary
information from the input files, including variable names, labels,
print and write formats, and missing-value indicators. The new active
dataset also contains the documents from each input file, unless the
DROP DOCUMENTS
command is used. -
UPDATE
copies all variables in order from the primary file, then all variables in order from the first transaction file, then all variables in order from the second transaction file, and so on. - Cases are updated when they are matched on the
BY
variable(s). If the primary and transaction files contain common variables for matched cases, the values for those variables are taken from the transaction file, provided that the values are not missing or blanks. Missing or blank values in the transaction files are not used to update values in the primary file. - When
UPDATE
encounters duplicate keys within a transaction file, it applies each transaction sequentially to that case to produce one case per key value in the resulting file. If more than one transaction file is specified, the value for a variable comes from the last transaction file with a nonmissing value for that variable. - Variables that are in the transaction files but not in the primary file are added to the primary file. Cases that do not contain those variables are assigned the system-missing value (for numerics) or blanks (for strings).
- Cases that are in the transaction files but not in the primary file are added to the primary file and are interleaved according to their values for the key variables.
- If the active dataset is named as an input file, any
N
andSAMPLE
commands that have been specified are applied to the active dataset before files are combined. - The
TEMPORARY
command cannot be in effect if the active dataset is used as an input file.
Limitations
- A maximum of
one
BY
subcommand. However,BY
can specify multiple variables.