Data set stage
The Data Set stage is a file stage that allows you to read data from or write data to a data set.
The Data Set stage is a file stage. It allows you to read data from or write data to a data set. The stage can have a single input link or a single output link. It can be configured to execute in parallel or sequential mode.
What is a data set? Parallel jobs use data sets to
manage data within a job. You can think of each link in a job as carrying
a data set. The Data Set stage allows you to store data being operated
on in a persistent form, which can then be used by other InfoSphere® DataStage® jobs.
Data sets are operating system files, each referred to by a control
file, which by convention has the suffix .ds. Using data sets wisely
can be key to good performance in a set of linked jobs. You can also
manage data sets independently of a job using the Data Set Management
utility, available from the InfoSphere DataStage Designer
or Director.
The stage editor has up to three pages, depending on whether you are reading or writing a data set:
- Stage Page. This is always present and is used to specify general information about the stage.
- Input Page. This is present when you are writing to a data set. This is where you specify details about the data set being written to.
- Output Page. This is present when you are reading from a data set. This is where you specify details about the data set being read from.