Structure of data sets

A data set comprises a descriptor file and a number of other files that are added as the data set grows.

These files are stored on multiple disks in your system. A data set is organized in terms of partitions and segments. Each partition of a data set is stored on a single processing node. Each data segment contains all the records written by a single IBM® InfoSphere® DataStage® job. So a segment can contain files from many partitions, and a partition has files from many segments.

Shows a schematic diagram of data sets — Figure 1. Structure of data sets

The descriptor file for a data set contains the following information:

Data set header information.
Creation time and date of the data set.
The schema of the data set.
A copy of the configuration file use when the data set was created.

For each segment, the descriptor file contains:

The time and date the segment was added to the data set.
A flag marking the segment as valid or invalid.
Statistical information such as number of records in the segment and number of bytes.
Path names of all data files, on all processing nodes.

This information can be accessed through the Data Set Manager.