Overview (UPDATE command)

UPDATE replaces values in a master file with updated values recorded in one or more files called transaction files. Cases in the master file and transaction file are matched according to a key variable.

The master file and the transaction files must be IBM® SPSS® Statistics data files or datasets available in the current session, including the active dataset. UPDATE replaces values and creates a new active dataset, which replaces the original active dataset.

UPDATE is designed to update values of existing variables for existing cases. Use MATCH FILES to add new variables to a data file and ADD FILES to add new cases.

Options

Variable Selection. You can specify which variables from each input file are included in the new active dataset using the DROP and KEEP subcommands.

Variable Names. You can rename variables in each input file before combining the files using the RENAME subcommand. This permits you to combine variables that are the same but whose names differ in different input files, or to separate variables that are different but have the same name.

Variable Flag. You can create a variable that indicates whether a case came from a particular input file using IN. You can use the FIRST or LAST subcommand to create a variable that flags the first or last case of a group of cases with the same value for the key variable.

Variable Map. You can request a map showing all variables in the new active dataset, their order, and the input files from which they came using the MAP subcommand.

Basic Specification

The basic specification is two or more FILE subcommands and a BY subcommand.

  • The first FILE subcommand must specify the master file. All other FILE subcommands identify the transaction files.
  • BY specifies the key variables.
  • All files must be sorted in ascending order by the key variables.
  • By default, all variables from all input files are included in the new active dataset.

Subcommand Order

  • The master file must be specified first.
  • RENAME and IN must immediately follow the FILE subcommand to which they apply.
  • BY must follow the FILE subcommands and any associated RENAME and IN subcommands.
  • MAP, DROP, and KEEP must be specified after all FILE and RENAME subcommands.

Syntax Rules

  • BY can be specified only once. However, multiple variables can be specified on BY. All files must be sorted in ascending order by the key variables named on BY.
  • The master file cannot contain duplicate values for the key variables.
  • RENAME can be repeated after each FILE subcommand and applies only to variables in the file named on the immediately preceding FILE subcommand.
  • MAP can be repeated as often as needed.

Operations

  • UPDATE reads all input files named on FILE and builds a new active dataset. The new active dataset is built when the data are read by one of the procedure commands or the EXECUTE, SAVE, or SORT CASES command.
    • If the current active dataset is included and is specified with an asterisk (FILE=*), the new merged dataset replaces the active dataset. If that dataset is a named dataset, the merged dataset retains that name. If the current active dataset is not included or is specified by name (for example, FILE=Dataset1), a new unnamed, merged dataset is created, and it becomes the active dataset. For information on naming datasets, see DATASET NAME.
  • The new active dataset contains complete dictionary information from the input files, including variable names, labels, print and write formats, and missing-value indicators. The new active dataset also contains the documents from each input file, unless the DROP DOCUMENTS command is used.
  • UPDATE copies all variables in order from the master file, then all variables in order from the first transaction file, then all variables in order from the second transaction file, and so on.
  • Cases are updated when they are matched on the BY variable(s). If the master and transaction files contain common variables for matched cases, the values for those variables are taken from the transaction file, provided that the values are not missing or blanks. Missing or blank values in the transaction files are not used to update values in the master file.
  • When UPDATE encounters duplicate keys within a transaction file, it applies each transaction sequentially to that case to produce one case per key value in the resulting file. If more than one transaction file is specified, the value for a variable comes from the last transaction file with a nonmissing value for that variable.
  • Variables that are in the transaction files but not in the master file are added to the master file. Cases that do not contain those variables are assigned the system-missing value (for numerics) or blanks (for strings).
  • Cases that are in the transaction files but not in the master file are added to the master file and are interleaved according to their values for the key variables.
  • If the active dataset is named as an input file, any N and SAMPLE commands that have been specified are applied to the active dataset before files are combined.
  • The TEMPORARY command cannot be in effect if the active dataset is used as an input file.

Limitations

  • A maximum of one BY subcommand. However, BY can specify multiple variables.