Configuring sparse lookup operations
You can configure a Hive connector stage to perform sparse (direct) lookup operation on a Hive data source.
- To specify the format of the data records that the Hive connector reads from an Hive database, set up column definitions on a link.
- Configure the Hive connector as a source for the reference data.
In a sparse lookup, the connector runs the specified SELECT statement once for each parameter set that arrives in the form of a record on the input link to the Lookup stage. The specified input parameters in the statement must have corresponding columns defined on the reference link. Each input record includes a set of parameter values that are represented by key columns. The Hive connector stage sets the parameter values on the bind variables in the SELECT statement, and then the Hive connector stage runs the statement. The result of the lookup is routed as one or more records through the reference link from the Hive connector stage back to the Lookup stage and from the Lookup stage to the output link of the Lookup stage. A sparse lookup is also known as a direct lookup because the lookup is performed directly on the data source.
Typically, you use a sparse lookup when the target table is too large to fit in memory.