Structure of data sets
A data set comprises a descriptor file and a number of other files that are added as the data set grows.
These files are stored on multiple disks in your system. A data set is organized in terms of partitions and segments. Each partition of a data set is stored on a single processing node. Each data segment contains all the records written by a single IBM® InfoSphere® DataStage® job. So a segment can contain files from many partitions, and a partition has files from many segments.

The descriptor file for a data set contains the following information:
- Data set header information.
- Creation time and date of the data set.
- The schema of the data set.
- A copy of the configuration file use when the data set was created.
For each segment, the descriptor file contains:
- The time and date the segment was added to the data set.
- A flag marking the segment as valid or invalid.
- Statistical information such as number of records in the segment and number of bytes.
- Path names of all data files, on all processing nodes.
This information can be accessed through the Data Set Manager.