Combine Records stage (DataStage)

The Combine Records stage is a restructure stage. This stage combines records (that is, rows), in which particular key-column values are identical, into vectors of subrecords.

The Combine Records stage can have a single input link and a single output link.

The Combine Records stage combines records (that is, rows), in which particular key-column values are identical, into vectors of subrecords. As input, the stage takes a data set in which one or more columns are chosen as keys. All adjacent records whose key columns contain the same value are gathered into the same record in the form of subrecords.
Shows columns being combined into a vector of subrecords

The data set input to the Combine Records stage must be key partitioned and sorted, which ensures that rows with the same key column values are located in the same partition and will be processed by the same node. Choosing the (auto) partitioning method ensures that partitioning and sorting is done. If sorting and partitioning are carried out on separate stages before the Combine Records stage, DataStage® in auto mode will detect this and not repartition (alternatively you could explicitly specify the Same partitioning method).

The stage editor has three tabs:

  • Stage tab. This tab is always present and is used to specify general information about the stage.
  • Input tab. This tab is where you specify the details about the single input set from which you are selecting records.
  • Output tab. This tab is where you specify details about the processed data being output from the stage.

Input tab

Specify an optional description of the input link in the Description section. In the Partitions section, specify how incoming data is partitioned before the data is converted. In the Columns section, specify the column definitions of incoming data. In the Advanced section, you can change the default buffering settings for the input link.

Output tab

On the Output tab you can specify details about data output from the Combine Records stage. The Combine Records stage can have only one output link.

Specify an optional description of the input link in the Description section. In the Partitions section, specify how incoming data is partitioned before the data is converted. In the Columns section, specify the column definitions of incoming data. In the Advanced section, you can change the default buffering settings for the input link.

Specify an optional description of the input link in the Description section. The Columns section specifies the column definitions of the data. Use the Advanced section to change the default buffering settings for the output link.