Overview (KEYED DATA LIST command)

KEYED DATA LIST reads raw data from two types of nonsequential files: direct-access files, which provide direct access by a record number, and keyed files, which provide access by a record key. An example of a direct-access file is a file of 50 records, each corresponding to one of the United States. If you know the relationship between the states and the record numbers, you can retrieve the data for any specific state. An example of a keyed file is a file containing social security numbers and other information about a firm’s employees. The social security number can be used to identify the records in the file.

Direct-Access Files

There are various types of direct-access files. This program’s concept of a direct-access file, however, is very specific. The file must be one from which individual records can be selected according to their number. The records in a 100-record direct-access file, for example, are numbered from 1 to 100.

Although the concept of record number applies to almost any file, not all files can be treated by this program as direct-access files. In fact, some operating systems provide no direct-access capabilities at all, and others permit only a narrowly defined subset of all files to be treated as direct access.

Very few files turn out to be good candidates for direct-access organization. In the case of an inventory file, for example, the usual large gaps in the part numbering sequence would result in large amounts of wasted file space. Gaps are not a problem, however, if they are predictable. For example, if you recognize that telephone area codes have first digits of 2 through 9, second digits of 0 or 1, and third digits of 0 through 9, you can transform an area code into a record number by using the following COMPUTE statement:

COMPUTE RECNUM = 20*(DIGIT1-2) + 10*DIGIT2 + DIGIT3 + 1.

where DIGIT1, DIGIT2, and DIGIT3 are variables corresponding to the respective digits in the area code, and RECNUM is the resulting record number. The record numbers would range from 1, for the nonexistent area code 200, through 160, for area code 919. The file would then have a manageable number of unused records.

Keyed Files

Of the many kinds of keyed files, the ones to which the program can provide access are generally known as indexed sequential files. A file of this kind is basically a sequential file in which an index is maintained so that the file can be processed either sequentially or selectively. In effect, there is an underlying data file that is accessed through a file of index entries. The file of index entries may, for example, contain the fact that data record 797 is associated with social security number 476-77-1359. Depending on the implementation, the underlying data may or may not be maintained in sequential order.

The key for each record in the file generally comprises one or more pieces of information found within the record. An example of a complex key is a customer’s last name and house number, plus the consonants in the street name, plus the zip code, plus a unique digit in case there are duplicates. Regardless of the information contained in the key, the program treats it as a character string.

On some systems, more than one key is associated with each record. That is, the records in a file can be identified according to different types of information. Although the primary key for a file normally must be unique, sometimes the secondary keys need not be. For example, the records in an employee file might be identified by social security number and job classification.

Options

Data Source. You can specify the name of the keyed file on the FILE subcommand. By default, the last file that was specified on an input command, such as DATA LIST or REPEATING DATA, is read.

Summary Table. You can display a table that summarizes the variable definitions.

Basic Specification

  • The basic specification requires FILE, KEY, and IN, each of which specifies one variable, followed by a slash and variable definitions.
  • FILE specifies the direct-access or keyed file. The file must have a file handle already defined.
  • KEY specifies the variable whose value will be used to read a record. For direct-access files, the variable must be numeric; for keyed files, it must be string.
  • IN creates a logical variable that flags whether a record was successfully read.
  • Variable definitions follow all subcommands; the slash preceding them is required. Variable definitions are similar to those specified on DATA LIST.

Subcommand Order

  • Subcommands can be named in any order.
  • Variable definitions must follow all specified subcommands.

Syntax Rules

  • Specifications for the variable definitions are the same as those described for DATA LIST. The only difference is that only one record can be defined per case.
  • The FILE HANDLE command must be used if the FILE subcommand is specified on KEYED DATA LIST.
  • KEYED DATA LIST can be specified in an input program, or it can be used as a transformation language to change an existing active dataset. This differs from all other input commands, such as GET and DATA LIST, which create new active datasets.

Operations

  • Variable names are stored in the active dataset dictionary.
  • Formats are stored in the active dataset dictionary and are used to display and write the values. To change output formats of numeric variables, use the FORMATS command.