Transformer stage basic concepts (DataStage®)

You use the Transformer stage to define the data that will be output by the stage and how it will be transformed.

When you first edit a Transformer stage, it is likely that you will have already defined what data is input to the stage on the input link.

Input link

The input data source is joined to the Transformer stage via the input link.

Output link

You can specify output links from your Transformer stage. You can specify how you want the output links transformed, derivation details, and constraints on the output links.

You can have any number of output links from your Transformer stage.

You might want to pass some data straight through the Transformer stage unaltered, but it's likely that you'll want to transform data from some input columns before outputting it from the Transformer stage.

You can specify such an operation by entering a transform expression. The source of an output link column is defined in that column's Derivation cell in the Output section on the Output tab. You can also enter expressions in this cell.

Looping

You can use the looping mechanism in the Transformer stage to create multiple output rows from a single input row.

The following scenarios give examples of how you can use the looping facility.

Example 1: Input data with multiple repeating columns

When the input data contains rows with multiple columns containing repeating data, you can use the Transformer stage to produce multiple output rows: one for each of the repeating columns.

For example, if the input row contained the following data.
Col1 Col2 Name1 Name2 Name3
abc def Jim Bob Tom
You can use the Transformer stage to flatten the input data and create multiple output rows for each input row. The data now comprises the following columns.
Col1 Col2 Name
abc def Jim
abc def Bob
abc def Tom

Example 2: Input data with multiple repeating values in a single field

When you have data where a single column contains multiple repeating values that are separated by a delimiter, you can flatten the data to produce multiple output columns: one for each of the delimited values. You can also specify that certain values are filtered out, and not have a new row created.
For example, the input row contains the following data.
Col1 Col2 Names
abc def Jim/Bob/Tom
You want to flatten the name field so a new row is created for every new name indicated by the backslash (/) character. You also want to filter out the name Jim and drop the column named Col2, so that the resulting output data for the example column produces two rows with two columns.
Col1 Name
abc Bob
abc Tom

Example 3: Value in an input row column used to generate new output rows

You can use the Transformer stage to generate new rows, based on values held in an input column.
For example, you have an input column that contains a count, and want to generate output rows based on the value of the count. The following example column has a count value of 5.
Col1 Col2 MaxCount
abc def 5
You can generate five output rows for this one input row based on the value in the Count column.
Col1 Col2 EntryNumber
abc def 1
abc def 2
abc def 3
abc def 4
abc def 5

Example 4: Input row group aggregation included with input row data

You can save input rows to a cache area, so that you can process this data in a loop.

For example, you have input data that has a column holding a price value. You want to add a column to the output rows. The new column indicates what percentage the price value is of the total value for prices in all rows in that group. The value for the new Percentage column is calculated by the following expression.

(price * 100)/sum of all prices in group
In the example, the data is sorted and is grouped on the value in Col1.
Col1 Col2 Price
1000 abc 100.00
1000 def 20.00
1000 ghi 60.00
1000 jkl 20.00
2000 zyx 120.00
2000 wvu 110.00
2000 tsr 170.00
The percentage for each row in the group where Col1 = 1000 is calculated by the following expression.
(price * 100)/200
The percentage for each row in the group where Col1 = 2000 is calculated by the following expression.
(price * 100)/400
The output is shown in the following table.
Col1 Col2 Price Percentage
1000 abc 100.00 50.00
1000 def 20.00 10.00
1000 ghi 60.00 30.00
1000 jkl 20.00 10.00
2000 zyx 120.00 30.00
2000 wvu 110.00 27.50
2000 tsr 170.00 42.50
This scenario uses key break facilities that are available on the Transformer stage. You can use these facilities to detect when the value of an input column changes, and so group rows as you process them.