You can use the Transformer stage to convert a single row for data with repeating values
in a single column to multiple output rows.
Input data with multiple repeating values in a single field
When you have
data where a single column contains multiple repeating values that are separated by a delimiter, you
can flatten the data to produce multiple output columns: one for each of the delimited values. You
can also specify that certain values are filtered out, and not have a new row created.For
example, the input row contains the following data.
Col1 |
Col2 |
Names |
abc |
def |
Jim/Bob/Tom |
You want to flatten the name field so a new row is created for every new name
indicated by the backslash (/) character. You also want to filter out the name Jim and drop the
column named Col2, so that the resulting output data for the example column produces two rows with
two columns.
Col1 |
Name |
abc |
Bob |
abc |
Tom |
To implement this scenario in the Transformer stage, make the following settings:
- Stage variable
- Define a stage variable to hold a count of the fields separated by the delimiter character. The
value of StageVar1 is set by the following
expression:
DCOUNT(inlink.Names, "/")
- Loop condition
- Enter the following expression as the loop
condition:
@ITERATION <= StageVar1
The loop continues to iterate for the
count in the Names column.
- Loop variable
- Define a loop variable to supply the value for the new column Name in your output rows. The
value of LoopVar1 is set by the following
expression:
FIELD(inlink.Names, "/", @ITERATION, 1)
This expression extracts
the substrings delimited by the slash character (/) from the input column.
- Output link constraint
- Define an output link constraint to filter out the name Jim. Use the following expression to
define the constraint:
LoopVar1 <> "Jim"
- Output link metadata and derivations
- Define the output link columns and their derivations. Drop the Col2 column by not including it
in the metadata.
- Col1 - inlink.col1
- Name - LoopVar1