Data set stage

The Data Set stage is a file stage that allows you to read data from or write data to a data set.

The Data Set stage is a file stage. It allows you to read data from or write data to a data set. The stage can have a single input link or a single output link. It can be configured to execute in parallel or sequential mode.

What is a data set? Parallel jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set. The Data Set stage allows you to store data being operated on in a persistent form, which can then be used by other InfoSphere® DataStage® jobs. Data sets are operating system files, each referred to by a control file, which by convention has the suffix .ds. Using data sets wisely can be key to good performance in a set of linked jobs. You can also manage data sets independently of a job using the Data Set Management utility, available from the InfoSphere DataStage Designer or Director.

Shows a persistent data set stored in a Data Set stage

The stage editor has up to three pages, depending on whether you are reading or writing a data set:

Stage Page. This is always present and is used to specify general information about the stage.
Input Page. This is present when you are writing to a data set. This is where you specify details about the data set being written to.
Output Page. This is present when you are reading from a data set. This is where you specify details about the data set being read from.