You can configure a Hive connector stage to perform sparse (direct) lookup operation on a
Hive data source.
Before you begin
- To specify the format of the data records that the Hive connector reads from an Hive database,
set up column definitions on a link.
- Configure the Hive connector as a source for the reference data.
About this task
In a sparse lookup, the connector runs the specified SELECT statement once for each parameter set
that arrives in the form of a record on the input link to the Lookup stage. The specified input
parameters in the statement must have corresponding columns defined on the reference link. Each
input record includes a set of parameter values that are represented by key columns. The Hive
connector stage sets the parameter values on the bind variables in the SELECT statement, and then
the Hive connector stage runs the statement. The result of the lookup is routed as one or more
records through the reference link from the Hive connector stage back to the Lookup stage and from
the Lookup stage to the output link of the Lookup stage. A sparse lookup is also known as a direct
lookup because the lookup is performed directly on the data source.
Typically, you use a sparse lookup when the target table is too large to fit in memory.
Procedure
-
Add a Lookup stage to the job design canvas, and then create a reference link from the Hive
Connector stage to the Lookup stage.
-
Double-click the Hive Connector stage.
- From the Lookup Type list, select Sparse.
- Specify the key columns:
- If you set Generate SQL to Yes when
you configured the connector as a source, specify the table name,
and then specify the key columns on the Columns page.
-
If you set Generate SQL to No when you configured
the connector as a source, specify a value for the Select statement
property.
In the select part of the SELECT statement, list the columns to return to the Lookup stage.
Ensure that this list matches the columns on the Columns page.
Ensure that the columns in the select list have the matching columns on the reference link. Each
parameter in the WHERE clause of the statement must have the word ORCHESTRATE and a
period added to the beginning of the column name. ORCHESTRATE can be all uppercase or
all lowercase letters, such as ORCHESTRATE.Field001. The following SELECT
statement is anexample of the correct syntax of the WHERE clause: select Field002,Field003
from MY_TABLE where Field001 = ORCHESTRATE.Field001
. The column names that follow the word
ORCHESTRATE must have the matching key columns on the reference link.
-
To save the changes, click OK.
-
Double-click the Lookup stage.
-
To specify the key columns, drag the required columns from the input link to the reference
link.
The columns from the input link contain values that are used as input values for the lookup
operation.
-
Map the input link and reference link columns to the output link columns
and specify conditions for a lookup failure:
-
Drag or copy the columns from the input link and reference link to your output
link.
-
To define conditions for a lookup failure, click the Constraints
icon in the menu.
-
In the Lookup Failure column, select a value, and then click OK.
If you select Reject, you must have a reject link from the Lookup stage and a
target stage in your job configuration to capture the rejected records.
-
Click OK.
-
Save, compile, and run the job.