Overview (DATASET NAME command)

The DATASET commands (DATASET NAME, DATASET ACTIVATE, DATASET DECLARE, DATASET COPY, DATASET CLOSE) provide the ability to have multiple data sources open at the same time and control which open data source is active at any point in the session. Using defined dataset names, you can then:

  • Merge data (for example, MATCH FILES, ADD FILES, UPDATE) from multiple different source types (for example, text data, database, spreadsheet) without saving each one as an external IBM® SPSS® Statistics data file first.
  • Create new datasets that are subsets of open data sources (for example, males in one subset, females in another, people under a certain age in another, or original data in one set and transformed/computed values in another subset).
  • Copy and paste variables, cases, and/or variable properties between two or more open data sources in the Data Editor.

The DATASET NAME command:

  • Assigns a unique name to the active dataset, which can be used in subsequent file access commands and subsequent DATASET commands.
  • Makes the current data file available even after other data sources have been opened/activated.

The following general rules apply:

  • If the active dataset already has a defined dataset name, the existing association is broken, and the new name is associated with the active file.
  • If the name is already associated with another dataset, that association is broken, and the new association is created. The dataset previously associated with that name is closed and is no longer available.

Basic Specification

The basic specification for DATASET NAME is the command name followed by a name that conforms to variable naming rules. See the topic Variable Names for more information.

WINDOW Keyword

The WINDOW keyword controls the state of the Data Editor window associated with the dataset.

ASIS. The Data Editor window containing the dataset is not affected. This is the default.

FRONT. The Data Editor window containing the dataset is brought to the front and the dataset becomes the active dataset for dialog boxes.

Operations

  • Commands operate on the active dataset. The active dataset is the data source most recently opened (for example, by commands such as GET DATA, GET SAS, GET STATA, GET TRANSLATE) or most recently activated by a DATASET ACTIVATE command.

    Note: The active dataset can also be changed by clicking anywhere in the Data Editor window of an open data source or selecting a dataset from the list of available datasets in a syntax window toolbar.

  • Variables from one dataset are not available when another dataset is the active dataset.
  • Transformations to the active dataset--before or after defining a dataset name--are preserved with the named dataset during the session, and any pending transformations to the active dataset are automatically executed whenever a different data source becomes the active dataset.
  • Dataset names can be used in most commands that can contain references to IBM SPSS Statistics data files.
  • For commands that can create a new dataset or overwrite an existing dataset, you cannot use the dataset name of the active dataset to overwrite the active dataset. For example, if the active dataset is mydata, a command with the subcommand /OUTFILE=mydata will result in an error. To overwrite a named active dataset, use an asterisk instead of the dataset name, as in: /OUTFILE=*.
  • Wherever a dataset name, file handle (defined by the FILE HANDLE command), or filename can be used to refer to IBM SPSS Statistics data files, defined dataset names take precedence over file handles, which take precedence over filenames. For example, if file1 exists as both a dataset name and a file handle, FILE=file1 in the MATCH FILES command will be interpreted as referring to the dataset named file1, not the file handle.

Example

GET FILE='/examples/data/mydata.sav'.
SORT CASES BY ID.
DATASET NAME mydata.
GET DATA /TYPE=XLS 
  /FILE='/examples/data/excelfile.xls'.
SORT CASES BY ID.
DATASET NAME excelfile.
GET DATA /TYPE=ODBC /CONNECT=
 'DSN=MS Access Database;DBQ=/examples/data/dm_demo.mdb;'+
 'DriverId=25;FIL=MS Access;MaxBufferSize=2048;PageTimeout=5;'
 /SQL='SELECT * FROM main'.
SORT CASES BY ID.
MATCH FILES
 /FILE='mydata'
 /FILE='excelfile'
 /FILE=*
 /BY ID.
  • A data file in IBM SPSS Statistics format and assigned the dataset name mydata. Since it has been assigned a dataset name, it remains available for subsequent use even after other data sources have been opened.
  • An Excel file is then read and assigned the dataset name exceldata. Like the IBM SPSS Statistics data file, since it has been assigned a dataset name, it remains available after other data sources have been opened.
  • Then a table from a database is read. Since it is the most recently opened or activated dataset, it is the active dataset.
  • The three datasets are then merged together with MATCH FILES command, using the dataset names on the FILE subcommands instead of file names.
  • An asterisk (*) is used to specify the active dataset, which is the database table in this example.
  • The files are merged together based on the value of the key variable ID, specified on the BY subcommand.
  • Since all the files being merged need to be sorted in the same order of the key variable(s), SORT CASES is performed on each dataset.