Aggregating input rows on output (DataStage®)
You have several options for aggregating input rows on output.
- Aggregate all rows in a single output row. This is the default option.
- Generate one output row per input row. This is the
Singlerowoption. - Trigger a new output row when the value of an input column changes.
- Trigger a new output row when the value of a passthrough column changes.
A passthrough column is an output column that has no XPath expression in the Description property and whose name exactly matches the name of an input column.
Examples
Four columns are involved in the transformation. ZONE values
also pass through to an output column called ZONE.
| CUSTOMER | DIVISION | CITY | ZONE |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
XPath expressions
On the input link, the following XPath expressions are used. The
CITY column has the repetition path.
| Input column | XPath expression |
|---|---|
CUSTOMER |
/directory/customer/@name |
DIVISION |
/directory/customer/division/text() |
CITY |
/directory/customer/city/text() |
ZONE |
/directory/customer/@zone |
Each example in Generating output uses the output column
CUSTOMERS. The CUSTOMERS column uses a forward slash (/) as the
XPath expression, which makes the entire XML available to it.
The passthrough example (see Modes: Passthrough and Aggregate all rows ) adds the ZONE column, which
omits an XPath expression. This makes the column available as a passthrough target.
Generating output
The section demonstrates the use of various output modes. For more information about setting the output mode, see Transformation settings page.
If you aggregate all input rows, XML Output creates one output row, which contains a single XML document.
<directory>
<customer name="Acme" zone="East">
<division>Toys</division>
<city>Boston</city>
<city>New York</city>
</customer>
<customer name="Acme" zone="East">
<division>Chemical</division>
<city>Raleigh</city>
</customer>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>
</directory>
If you request one output row per input row, XML Output generates four output rows. Each output row contains the XML chunk generated from a different input row.
Row One
<directory>
<customer name="Acme" zone="East">
<division>Toys</division>
<city>Boston</city>
</customer>
</directory>
Row Two
<directory>
<customer name="Acme" zone="East">
<division>Toys</division>
<city>New York</city>
</customer>
</directory>
Row Three
<directory>
<customer name="Acme" zone="East">
<division>Chemical</division>
<city>Raleigh</city>
</customer>
</directory>
Row Four
<directory>
<customer name="Acme" zone="East">
<division>Chemical</division>
<city>St. Louis</city>
</customer>
</directory>
If you trigger a new output row based on a change in
values in the DIVISION column, there will be two rows with different XML chunks:
one for the Toys division and one for the Chemical division.
Row One
<directory>
<customer name="Acme" zone="East">
<division>Toys</division>
<city>Boston</city>
<city>New York </city>
</customer>
</directory>
Row Two
<directory>
<customer name="Acme" zone="East">
<division>Chemical</division>
<city>Raleigh</city>
</customer>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>
</directory>
The passthrough mechanism works in conjunction with other options. For example, if you
use the Aggregate All Rows option, the column ZONE forces two output rows: one for
the XML chunks generated from the first three input rows and one for the last input row.
The
XML output is written to the CUSTOMERS column. The passthrough data is written to
the ZONE column.
| CUSTOMERS | ZONE | |
|---|---|---|
| row 1 |
|
|
| row 2 |
|
|