Overview (DATASET COPY command)
The DATASET
commands (DATASET NAME
, DATASET ACTIVATE
, DATASET DECLARE
, DATASET COPY
, DATASET CLOSE
) provide the
ability to have multiple data sources open at the same time and control
which open data source is active at any point in the session. Using
defined dataset names, you can then:
- Merge data
(for example,
MATCH FILES
,ADD FILES
,UPDATE
) from multiple different source types (for example, text data, database, spreadsheet) without saving each one as an external IBM® SPSS® Statistics data file first. - Create new datasets that are subsets of open data sources (for example, males in one subset, females in another, people under a certain age in another, or original data in one set and transformed/computed values in another subset).
- Copy and paste variables, cases, and/or variable properties between two or more open data sources in the Data Editor.
The DATASET COPY
command creates
a new dataset that captures the current state of the active dataset.
This is particularly useful for creating multiple subsets of data
from the same original data source.
- If the active dataset has a defined dataset name, its name remains associated with subsequent changes.
- If this command occurs when there are transformations
pending, those transformations are executed, as if
EXECUTE
had been run prior to making the copy; so the transformations appear in both the original and the copy. The command is illegal whereEXECUTE
would be illegal. If no transformations are pending, the data are not passed. - If the specified dataset name is already associated with a dataset, a warning is issued, the old dataset is destroyed, and the specified name becomes associated with the current state of the active dataset.
- If the specified name is associated with the active dataset, it becomes associated with the current state and the active dataset becomes unnamed.
Basic Specification
The basic specification for DATASET COPY
is the command name followed by a new dataset
name that conforms to variable naming rules. See the topic Variable Names for more information.
WINDOW Keyword
The WINDOW
keyword controls
the state of the Data Editor window associated with the dataset.
MINIMIZED. The Data Editor window associated with the new dataset is opened in a minimized state. This is the default.
HIDDEN. The Data Editor window associated with the new dataset is not displayed.
FRONT . The Data Editor window containing the dataset is brought to the front and the dataset becomes the active dataset for dialog boxes.
Operations
- Commands operate on the active dataset. The active dataset is the data source most
recently opened (for example, by commands such as
GET DATA
,GET SAS
,GET STATA
,GET TRANSLATE
) or most recently activated by aDATASET ACTIVATE
command.Note: The active dataset can also be changed by clicking anywhere in the Data Editor window of an open data source or selecting a dataset from the list of available datasets in a syntax window toolbar.
- Variables from one dataset are not available when another dataset is the active dataset.
- Transformations to the active dataset--before or after defining a dataset name--are preserved with the named dataset during the session, and any pending transformations to the active dataset are automatically executed whenever a different data source becomes the active dataset.
- Dataset names can be used in most commands that can contain references to IBM SPSS Statistics data files.
- For commands that can create a new dataset or overwrite
an existing dataset, you cannot use the dataset name of the active
dataset to overwrite the active dataset. For example, if the active
dataset is mydata, a command with
the subcommand
/OUTFILE=mydata
will result in an error. To overwrite a named active dataset, use an asterisk instead of the dataset name, as in:/OUTFILE=*
. - Wherever a dataset name,
file handle (defined by the
FILE HANDLE
command), or filename can be used to refer to IBM SPSS Statistics data files, defined dataset names take precedence over file handles, which take precedence over filenames. For example, if file1 exists as both a dataset name and a file handle,FILE=file1
in theMATCH FILES
command will be interpreted as referring to the dataset named file1, not the file handle.
Limitations
Because each window requires a minimum amount of memory, there is a limit to the number of windows, IBM SPSS Statistics or otherwise, that can be concurrently open on a given system. The particular number depends on the specifications of your system and may be independent of total memory due to OS constraints.
Example
DATASET NAME original.
DATASET COPY males.
DATASET ACTIVATE males.
SELECT IF gender=0.
DATASET ACTIVATE original.
DATASET COPY females.
DATASET ACTIVATE females.
SELECT IF gender=1.
- The first
DATASET COPY
command creates a new dataset, males, that represents the state of the active dataset at the time it was copied. - The males dataset is activated and a subset of males is created.
- The original dataset is activated, restoring the cases deleted from the males subset.
- The second
DATASET COPY
command creates a second copy of the original dataset with the name females, which is then activated and a subset of females is created. - Three different versions of the initial data file are now available in the session: the original version, a version containing only data for males, and a version containing only data for females.