Configuring the HBase connector for partitioned reads

If the HBase connector stage is configured to run on multiple processing nodes, each of the processing nodes reads data from the data source concurrently with other processing nodes. The partitions of records from all the processing nodes are combined to produce the complete record data set for the output link of the stage.

Before you begin

Add the HBase Connector stage to a parallel job.

About this task

When a HBase connector stage is configured to perform partitioned reads, each of the processing nodes of the stage reads a portion of data from the data source and the records retrieved by all the processing nodes are combined to produce the result set for the output link.

Procedure

  1. On the job design canvas, double-click the HBase Connector stage, and then click the Stage tab.
  2. Set the property Use parallel read to Yes. It determines whether connector should split reading data to all available player nodes to speed up the process.
  3. Chose the most appropriate row key characteristics using the Type of row keys in the target table property. Connector uses this setting to calculate row key ranges of regions to be processed by each player node to make load more efficient using all player nodes. Available options are:
    1. Uniform byte arrays - which means that the row keys are random byte arrays
    2. Numeric strings with values greater than zero - row keys containing only digits. The key values must be greater than zero
    3. Hexadecimal strings with values greater than zero - row keys containing hexadecimal numbers. The key values must be greater than zero
  4. On the Advanced page, set Execution mode to Parallel, or Default(Parallel), and then click the Output tab.
  5. Click OK, and then save the job.