Runtime column propagation
You can configure a Hive Connector stage to automatically add columns to its output link at run time.
Usage
Before you can enable runtime column propagation in a stage, runtime column propagation must be enabled for parallel jobs at the project level from the IBM InfoSphere DataStage and QualityStage Administrator.
When the runtime column propagation property is enabled for the Hive Connector stage, the connector inspects the columns for the result set of the statement that it is configured to run. Then the connector compares this set of columns to the set of columns that are defined on the output link. If any columns in the result set are not defined on the output links, the connector adds them to the output link.
To display and enable the Runtime column propagation property, you need to first enable the Enable Runtime Column Propagation for Parallel Jobs option for the current DataStage project in the DataStage and QualityStage Administrator. Then select the Runtime column propagation check box on the Columns page for the output link in the stage editor.
- Checks if Derivation properties for any link column matches the result set column name. This matching is case-sensitive. If the Derivation property value starts and ends with the quotation marks, the quoted identifiers are removed before the property is checked. If a matching column is found, the column is assumed to represent the result set column and the connector does not add a column to the link.
- If a matching column is not found, the connector checks if a Name property for any link column
matches the result set column. If a matching column is found, the column is assumed to represent the
result set column and the connector does not add a column to the link. If a matching column is not
found, the connector adds a column to the link. The connector adds a column to the output link that
has the same name as the result set column except for the following modifications:
- The dollar sign ($) is replaced with the __036__ value.
- The pound sign (#) is replaced with the __035__ value.
- Every non-alphanumeric character other than underscore sign (_), dollar sign ($) and pound sign (#) is replaced with the pair of underscores __ value.
- If any replacement with a pair of underscores is performed, or if the first character in the column name is not a letter character, underscore sign (_), dollar sign ($) or pound sign (#), the prefix CC_n is appended to the column name, where n is the index of the SQL expression column in the SELECT statement list.
- If the driver reported that the result set column name was an empty string or null, the column index (1, 2, and so on) is used as the column name.
- White space characters are ignored.
- If the resulting column name is the same as column name was already added to the link, the suffix m is added to it, where m is the smallest integer value that is greater than or equal to 1 that results in a unique column name.
In the columns that the connector adds, the Derivation property is set to a value that matches the result set column name.
Example
The following examples illustrate how runtime column propagation works.
Assume that the Hive Connector stage is configured to fetch data from the data source and provide records on the output link. The examples further assume that the Runtime column propagation check box is selected for the output link.
Suppose there are no columns defined for the output link, and the connector is configured to automatically generate SELECT statement to read from table TABLE1. TABLE1 contains the C1, C2 and C3 columns. The connector generates and runs the statement SELECT * FROM TABLE1 and adds the C1, C2, and C3 columns to the output link.
- C$1
- The connector replaces $ with __036__.
- C 2
- The connector removes the space character between C and 2.
- C!3#
- The connector replaces ! with __ and it replaces # with __035__ and it adds prefix CC_3_ because this column is the third column in the result set.