Column derivations

When the connector uses a SELECT statement to read rows from the data source, the connector must map the column names that are returned by the SELECT statement to the columns that are defined on the output link.

The connector uses the following process to map columns:
  1. The connector looks for the output link column that has the Derivation property set to a value that matches the column name that is returned by the SELECT statement.
  2. If the connector fails to locate such a column on the link, it looks for the output link column that has the Name property set to a value that matches the column name that is returned by the SELECT statement.
  3. If the connector again fails to locate such a column, it adds the column to the output link if runtime column propagation is enabled for the link.
  4. Otherwise, it ignores the values in the column that is returned by the SELECT statement when it reads rows from the data source.

When the connector maps columns, it compares the column name that is returned by the SELECT statement and the value of the Derivation property for the output link columns. If the Derivation value is surrounded by quotation marks, the quotation marks are removed before performing the comparison. Since the column names in Hive are not case-sensitive, the comparison is carried out in case insensitive mode.

When the connector is configured to automatically generate the SELECT statement at run time, it assembles the statement based on the specified Table name property and the column definitions on the output link. If runtime column propagation is enabled for the output link, the connector includes all the table columns in the statement by generating the following statement: SELECT * FROM table_name

If the runtime columns propagation is not enabled for the output link, the connector includes in the statement only the columns on the output link. For each column on the output link, it includes the value that is set for the Derivation property of the column. If no value is set for the Derivation property, the connector includes the value that is set for the Name property in the statement.

When the connector includes Derivation values in the SELECT statement that is generated, the connector always includes them in the exact same form in which they are specified for the link columns. This includes any quotation marks specified in the Derivation values. On the other hand, when it includes the column Name values in the generated SELECT statement text, if Enable quoted identifiers property is set to Yes, the connector encloses the values in quotation marks in the statement text. If Enable quoted identifiers property is set to No, the connector includes the values in the same form that they are specified for the link columns.

Note:

Hive supports back tick as the quotation mark for the column names. It needs to be enabled by setting the server side property.

Example

Assume that the table T1 has the following columns:

C1 A

C1 B

C2

To read rows from this table, you can define the columns on the output link with the Name property set to C1A, C1B, and C2 and Derivation property set to empty strings. You can then configure the connector to run the following SELECT statement (assume that the data source supports column aliases):

SELECT `C1 A` AS C1A, `C1 B` AS C1B, C2 FROM T1
In this case, the columns that are returned by the statement are C1 A, C1 B and C2. The connector does not find any columns on the output link that have the Derivation property set one of these values. However, because it finds columns that have the Name property set to each of these values, it maps the statement columns to those output link columns successfully.

Alternatively, you can utilize the Derivation property values of the output link columns. You can define the C1A, C1B, and C2 columns on the output link, set the Derivation property of the C1A columns to C1 A, set the Derivation property of the C1B columns to C1 B, and leave the Derivation property of the C2 column empty. You can then configure the stage to automatically generate the SELECT statement.

The connector uses the Derivation property of the C1A and C1B columns and generates the following SELECT statement:

SELECT "C1 A", "C1 B", C2 FROM T1
In this case, the columns that are returned by the statement are C1 A, C1 B and C2. The connector maps the statement column C1 A to the output link column C1A because the output link column has the Derivation property set to C1 A, which corresponds to the C1 A statement column name. Likewise, the connector maps the statement column C1 B to the output link column C1B because this output link column has the Derivation property set to C1 B which again corresponds to the C1 B statement column name. Finally, the connector maps the statement column C2 to the output link column C2.