Overview (DATASET COPY command)

The DATASET commands (DATASET NAME, DATASET ACTIVATE, DATASET DECLARE, DATASET COPY, DATASET CLOSE) provide the ability to have multiple data sources open at the same time and control which open data source is active at any point in the session. Using defined dataset names, you can then:

  • Merge data (for example, MATCH FILES, ADD FILES, UPDATE) from multiple different source types (for example, text data, database, spreadsheet) without saving each one as an external IBM® SPSS® Statistics data file first.
  • Create new datasets that are subsets of open data sources (for example, males in one subset, females in another, people under a certain age in another, or original data in one set and transformed/computed values in another subset).
  • Copy and paste variables, cases, and/or variable properties between two or more open data sources in the Data Editor.

The DATASET COPY command creates a new dataset that captures the current state of the active dataset. This is particularly useful for creating multiple subsets of data from the same original data source.

  • If the active dataset has a defined dataset name, its name remains associated with subsequent changes.
  • If this command occurs when there are transformations pending, those transformations are executed, as if EXECUTE had been run prior to making the copy; so the transformations appear in both the original and the copy. The command is illegal where EXECUTE would be illegal. If no transformations are pending, the data are not passed.
  • If the specified dataset name is already associated with a dataset, a warning is issued, the old dataset is destroyed, and the specified name becomes associated with the current state of the active dataset.
  • If the specified name is associated with the active dataset, it becomes associated with the current state and the active dataset becomes unnamed.

Basic Specification

The basic specification for DATASET COPY is the command name followed by a new dataset name that conforms to variable naming rules. See the topic Variable Names for more information.

WINDOW Keyword

The WINDOW keyword controls the state of the Data Editor window associated with the dataset.

MINIMIZED. The Data Editor window associated with the new dataset is opened in a minimized state. This is the default.

HIDDEN. The Data Editor window associated with the new dataset is not displayed.

FRONT . The Data Editor window containing the dataset is brought to the front and the dataset becomes the active dataset for dialog boxes.

Operations

  • Commands operate on the active dataset. The active dataset is the data source most recently opened (for example, by commands such as GET DATA, GET SAS, GET STATA, GET TRANSLATE) or most recently activated by a DATASET ACTIVATE command.

    Note: The active dataset can also be changed by clicking anywhere in the Data Editor window of an open data source or selecting a dataset from the list of available datasets in a syntax window toolbar.

  • Variables from one dataset are not available when another dataset is the active dataset.
  • Transformations to the active dataset--before or after defining a dataset name--are preserved with the named dataset during the session, and any pending transformations to the active dataset are automatically executed whenever a different data source becomes the active dataset.
  • Dataset names can be used in most commands that can contain references to IBM SPSS Statistics data files.
  • For commands that can create a new dataset or overwrite an existing dataset, you cannot use the dataset name of the active dataset to overwrite the active dataset. For example, if the active dataset is mydata, a command with the subcommand /OUTFILE=mydata will result in an error. To overwrite a named active dataset, use an asterisk instead of the dataset name, as in: /OUTFILE=*.
  • Wherever a dataset name, file handle (defined by the FILE HANDLE command), or filename can be used to refer to IBM SPSS Statistics data files, defined dataset names take precedence over file handles, which take precedence over filenames. For example, if file1 exists as both a dataset name and a file handle, FILE=file1 in the MATCH FILES command will be interpreted as referring to the dataset named file1, not the file handle.

Limitations

Because each window requires a minimum amount of memory, there is a limit to the number of windows, IBM SPSS Statistics or otherwise, that can be concurrently open on a given system. The particular number depends on the specifications of your system and may be independent of total memory due to OS constraints.

Example

DATASET NAME original.
DATASET COPY males.
DATASET ACTIVATE males.
SELECT IF gender=0.
DATASET ACTIVATE original.
DATASET COPY females.
DATASET ACTIVATE females.
SELECT IF gender=1.
  • The first DATASET COPY command creates a new dataset, males, that represents the state of the active dataset at the time it was copied.
  • The males dataset is activated and a subset of males is created.
  • The original dataset is activated, restoring the cases deleted from the males subset.
  • The second DATASET COPY command creates a second copy of the original dataset with the name females, which is then activated and a subset of females is created.
  • Three different versions of the initial data file are now available in the session: the original version, a version containing only data for males, and a version containing only data for females.