Transformer stage basic concepts (DataStage®)
You use the Transformer stage to define the data that will be output by the stage and how it will be transformed.
When you first edit a Transformer stage, it is likely that you will have already defined what data is input to the stage on the input link.
Input link
The input data source is joined to the Transformer stage via the input link.
Output link
You can specify output links from your Transformer stage. You can specify how you want the output links transformed, derivation details, and constraints on the output links.
You can have any number of output links from your Transformer stage.
You might want to pass some data straight through the Transformer stage unaltered, but it's likely that you'll want to transform data from some input columns before outputting it from the Transformer stage.
You can specify such an operation by entering a transform expression. The source of an output link column is defined in that column's Derivation cell in the Output section on the Output tab. You can also enter expressions in this cell.
Looping
You can use the looping mechanism in the Transformer stage to create multiple output rows from a single input row.
The following scenarios give examples of how you can use the looping facility.
Example 1: Input data with multiple repeating columns
When the input data contains rows with multiple columns containing repeating data, you can use the Transformer stage to produce multiple output rows: one for each of the repeating columns.
Col1 | Col2 | Name1 | Name2 | Name3 |
---|---|---|---|---|
abc | def | Jim | Bob | Tom |
Col1 | Col2 | Name |
---|---|---|
abc | def | Jim |
abc | def | Bob |
abc | def | Tom |
Example 2: Input data with multiple repeating values in a single field
When you have data where a single column contains multiple repeating values that are separated by a delimiter, you can flatten the data to produce multiple output columns: one for each of the delimited values. You can also specify that certain values are filtered out, and not have a new row created.Col1 | Col2 | Names |
---|---|---|
abc | def | Jim/Bob/Tom |
Col1 | Name |
---|---|
abc | Bob |
abc | Tom |
Example 3: Value in an input row column used to generate new output rows
You can use the Transformer stage to generate new rows, based on values held in an input column.Col1 | Col2 | MaxCount |
---|---|---|
abc | def | 5 |
Col1 | Col2 | EntryNumber |
---|---|---|
abc | def | 1 |
abc | def | 2 |
abc | def | 3 |
abc | def | 4 |
abc | def | 5 |
Example 4: Input row group aggregation included with input row data
You can save input rows to a cache area, so that you can process this data in a loop.For example, you have input data that has a column holding a price value. You want to add a column to the output rows. The new column indicates what percentage the price value is of the total value for prices in all rows in that group. The value for the new Percentage column is calculated by the following expression.
(price * 100)/sum of all prices in group
Col1 | Col2 | Price |
---|---|---|
1000 | abc | 100.00 |
1000 | def | 20.00 |
1000 | ghi | 60.00 |
1000 | jkl | 20.00 |
2000 | zyx | 120.00 |
2000 | wvu | 110.00 |
2000 | tsr | 170.00 |
(price * 100)/200
The percentage
for each row in the group where Col1 = 2000 is calculated by the following
expression.(price * 100)/400
The output is shown in the
following table.Col1 | Col2 | Price | Percentage |
---|---|---|---|
1000 | abc | 100.00 | 50.00 |
1000 | def | 20.00 | 10.00 |
1000 | ghi | 60.00 | 30.00 |
1000 | jkl | 20.00 | 10.00 |
2000 | zyx | 120.00 | 30.00 |
2000 | wvu | 110.00 | 27.50 |
2000 | tsr | 170.00 | 42.50 |