Configuring sparse lookup operations

You can configure a Hive connector stage to perform sparse (direct) lookup operation on a Hive data source.

  • To specify the format of the data records that the Hive connector reads from an Hive database, set up column definitions on a link.
  • Configure the Hive connector as a source for the reference data.

In a sparse lookup, the connector runs the specified SELECT statement once for each parameter set that arrives in the form of a record on the input link to the Lookup stage. The specified input parameters in the statement must have corresponding columns defined on the reference link. Each input record includes a set of parameter values that are represented by key columns. The Hive connector stage sets the parameter values on the bind variables in the SELECT statement, and then the Hive connector stage runs the statement. The result of the lookup is routed as one or more records through the reference link from the Hive connector stage back to the Lookup stage and from the Lookup stage to the output link of the Lookup stage. A sparse lookup is also known as a direct lookup because the lookup is performed directly on the data source.

Typically, you use a sparse lookup when the target table is too large to fit in memory.

  1. Add a Lookup stage to the job design canvas, and then create a reference link from the Hive Connector stage to the Lookup stage.
  2. Double-click the Hive Connector stage.
  3. From the Lookup Type list, select Sparse.
  4. Specify the key columns:
    1. If you set Generate SQL to Yes when you configured the connector as a source, specify the table name, and then specify the key columns on the Columns page.
    2. If you set Generate SQL to No when you configured the connector as a source, specify a value for the Select statement property.
      In the select part of the SELECT statement, list the columns to return to the Lookup stage. Ensure that this list matches the columns on the Columns page.

      Ensure that the columns in the select list have the matching columns on the reference link. Each parameter in the WHERE clause of the statement must have the word ORCHESTRATE and a period added to the beginning of the column name. ORCHESTRATE can be all uppercase or all lowercase letters, such as ORCHESTRATE.Field001. The following SELECT statement is anexample of the correct syntax of the WHERE clause: select Field002,Field003 from MY_TABLE where Field001 = ORCHESTRATE.Field001. The column names that follow the word ORCHESTRATE must have the matching key columns on the reference link.

  5. To save the changes, click OK.
  6. Double-click the Lookup stage.
  7. To specify the key columns, drag the required columns from the input link to the reference link.
    The columns from the input link contain values that are used as input values for the lookup operation.
  8. Map the input link and reference link columns to the output link columns and specify conditions for a lookup failure:
    1. Drag or copy the columns from the input link and reference link to your output link.
    2. To define conditions for a lookup failure, click the Constraints icon in the menu.
    3. In the Lookup Failure column, select a value, and then click OK. If you select Reject, you must have a reject link from the Lookup stage and a target stage in your job configuration to capture the rejected records.
    4. Click OK.
  9. Save, compile, and run the job.