Aggregating input rows on output (DataStage®)

You have several options for aggregating input rows on output.

  • Aggregate all rows in a single output row. This is the default option.
  • Generate one output row per input row. This is the Single row option.
  • Trigger a new output row when the value of an input column changes.
  • Trigger a new output row when the value of a passthrough column changes.

    A passthrough column is an output column that has no XPath expression in the Description property and whose name exactly matches the name of an input column.

Examples

Four columns are involved in the transformation. ZONE values also pass through to an output column called ZONE.

CUSTOMER DIVISION CITY ZONE

Acme

Toys

Boston

East

Acme

Toys

New York

East

Acme

Chemical

Raleigh

East

Acme

Chemical

St. Louis

Midwest

XPath expressions

On the input link, the following XPath expressions are used. The CITY column has the repetition path.

Input column XPath expression
CUSTOMER /directory/customer/@name
DIVISION /directory/customer/division/text()
CITY /directory/customer/city/text()
ZONE /directory/customer/@zone

Each example in Generating output uses the output column CUSTOMERS. The CUSTOMERS column uses a forward slash (/) as the XPath expression, which makes the entire XML available to it.

The passthrough example (see Modes: Passthrough and Aggregate all rows ) adds the ZONE column, which omits an XPath expression. This makes the column available as a passthrough target.

Generating output

The section demonstrates the use of various output modes. For more information about setting the output mode, see Transformation settings page.

Mode: Aggregate all rows

If you aggregate all input rows, XML Output creates one output row, which contains a single XML document.


<directory>
   <customer name="Acme" zone="East">
      <division>Toys</division>
      <city>Boston</city>
      <city>New York</city>
   </customer>
   <customer name="Acme" zone="East">
      <division>Chemical</division>
      <city>Raleigh</city>
   </customer>
   <customer name="Acme" zone="Midwest">
      <division>Chemical</division>
      <city>St. Louis</city>
   </customer>
</directory>
Mode: Single row

If you request one output row per input row, XML Output generates four output rows. Each output row contains the XML chunk generated from a different input row.

Row One


<directory>
   <customer name="Acme" zone="East">
      <division>Toys</division>
      <city>Boston</city>
   </customer>
</directory>

Row Two


<directory>
   <customer name="Acme" zone="East">
      <division>Toys</division>
      <city>New York</city>
   </customer>
</directory>

Row Three


<directory>
   <customer name="Acme" zone="East">
      <division>Chemical</division>
      <city>Raleigh</city>
   </customer>
</directory>

Row Four


<directory>
   <customer name="Acme" zone="East">
      <division>Chemical</division>
      <city>St. Louis</city>
   </customer>
</directory>
Mode: Use trigger column

If you trigger a new output row based on a change in values in the DIVISION column, there will be two rows with different XML chunks: one for the Toys division and one for the Chemical division.

Row One


<directory>
   <customer name="Acme" zone="East">
      <division>Toys</division>
      <city>Boston</city>
      <city>New York </city>
   </customer>
</directory>

Row Two


<directory>
   <customer name="Acme" zone="East">
      <division>Chemical</division>
      <city>Raleigh</city>
   </customer>
   <customer name="Acme" zone="Midwest">
      <division>Chemical</division>
      <city>St. Louis</city>
   </customer>
</directory>
Modes: Passthrough and Aggregate all rows

The passthrough mechanism works in conjunction with other options. For example, if you use the Aggregate All Rows option, the column ZONE forces two output rows: one for the XML chunks generated from the first three input rows and one for the last input row.

The XML output is written to the CUSTOMERS column. The passthrough data is written to the ZONE column.

  CUSTOMERS ZONE
row 1

<directory>
   <customer name="Acme" zone="East">
      <division>Toys</division>
      <city>Boston </city>
      <city>New York</city>
   </customer>
   <customer name="Acme" zone="East">
      <division>Chemical</division>
      <city>Raleigh</city>
   </customer>
</directory>

East
row 2

<directory>
   <customer name="Acme" 
    zone="Midwest">
      <division>Chemical</division>
      <city>St. Louis</city>
   </customer>
</directory>

Midwest
Note: When using passthrough in a parallel job, the output link column must be a string if the corresponding input link column is a string. Passthrough from a string type to a Unicode string type is not supported.