Structure of data sets

A data set comprises a descriptor file and other files that are added as the data set grows, and are organized into partitions and segments.

The descriptor file and other files are stored on multiple disks in your system. Each partition of a data set is stored on a single processing node. Each data segment contains all the records written by a single InfoSphere® DataStage® job. So a segment can contain files from many partitions, and a partition has files from many segments.

Shows the relationship between partitions and segments in a data set

The descriptor file for a data set contains the following information:

  • Data set header information.
  • Creation time and date of the data set.
  • The schema of the data set.
  • A copy of the configuration file use when the data set was created.

For each segment, the descriptor file contains:

  • The time and date the segment was added to the data set.
  • A flag marking the segment as valid or invalid.
  • Statistical information such as number of records in the segment and number of bytes.
  • Path names of all data files, on all processing nodes.

This information can be accessed through the Data Set Manager.