Compare stage in DataStage
The Compare stage performs a column-by-column comparison of records in two presorted input data sets. You can restrict the comparison to specified key columns.
The Compare stage is a processing stage. It can have two input links and a single output link.
The Compare stage does not change the table definition, partitioning, or content of the records in either input data set. It transfers both data sets intact to a single output data set generated by the stage. The comparison results are also recorded in the output data set.
You can use runtime column propagation in this stage and allow IBM DataStage to define the output column schema for you at runtime. The stage outputs a data set with three columns:
- result. Carries the code giving the result of the comparison.
- first. A subrecord containing the columns of the first input link.
- second. A subrecord containing the columns of the second input link.
- Specify the parent column for the output data corresponding to the first input link, and set the SQL type to unknown.
- Specify the actual columns that carry your data and make these sub-records of the parent column. Name each column first.colname, for example first.col1, first.col2 and so on. Make each column a subrecord by selecting the column, selecting edit row from the shortcut menu, and specifying a level number (for example, 03) for that column. (You can speed up this process by making the first column a subrecord and using the propagate values feature to make the remaining columns sub-records of the parent column.)
- Specify the parent column for output data corresponding to the second input link, and set the SQL type to unknown.
- Specify the actual columns that carry the data from the second input link, name them second.colname (for example, second.col1, second.col2) and make these sub-records of the parent column.
When you double click the Compare stage, the properties panel opens. The properties panel has three tabs:
- Stage. This is always present and is used to specify general information about the stage.
- Input. This is where you specify details about the data being grouped or aggregated.
- Output. This is where you specify details about the groups being output from the stage.
Input tab
The Columns section specifies the column definitions of incoming data. The Advanced tab allows you to change the default buffering settings for the input link.
Output tab
The Columns section specifies the column definitions of the data. The Advanced section allows you to change the default buffering settings for the output link.