Configuring sparse lookup operations
You can configure a HBase connector stage to perform sparse (direct) lookup operation on a HBase table.
Before you begin
- To specify the format of the data records that the HBase connector reads from an HBase table, set up column definitions on a link.
- Configure the HBase connector as a source for the reference data.
About this task
In a sparse lookup, the connector fetches one record from HBase table for each record that arrives on the input link to the Lookup stage. The input link columns definitions must have one and only one column with the same name and data type as the name and data type of primary key column defined in HBase reference link. Since the name of the primary key column can be arbitrarily chosen by the user it should be simple to match HBase table row key with the corresponding column in the input link. The result of the lookup is routed as one record through the reference link from the HBase connector stage back to the Lookup stage and from the Lookup stage to the output link of the Lookup stage. A sparse lookup is also known as a direct lookup because the lookup is performed directly on the data source
Typically, you use a sparse lookup when the target table is too large to fit in memory. If you use a parallel read option and processing is performed on many player nodes you must ensure that the input data set is also adequately partitioned in relation to the values in the lookup key column.