If the HBase connector stage is configured to run on multiple processing nodes, each of
the processing nodes reads data from the data source concurrently with other processing nodes. The
partitions of records from all the processing nodes are combined to produce the complete record data
set for the output link of the stage.
Before you begin
Add the HBase Connector stage to a parallel job.
About this task
When a HBase connector stage is configured to perform partitioned reads, each of the
processing nodes of the stage reads a portion of data from the data source and the records retrieved
by all the processing nodes are combined to produce the result set for the output link.
Procedure
-
On the job design canvas, double-click the HBase Connector stage, and then click the
Stage tab.
-
Set the property Use parallel read to Yes. It
determines whether connector should split reading data to all available player nodes to speed up the
process.
-
Chose the most appropriate row key characteristics using the Type of row keys in the
target table property. Connector uses this setting to calculate row key ranges of
regions to be processed by each player node to make load more efficient using all player nodes.
Available options are:
-
Uniform byte arrays - which means that the row keys are random byte arrays
-
Numeric strings with values greater than zero - row keys containing only digits. The key values
must be greater than zero
-
Hexadecimal strings with values greater than zero - row keys containing hexadecimal numbers.
The key values must be greater than zero
-
On the Advanced page, set Execution mode to
Parallel, or Default(Parallel), and then click the
Output tab.
-
Click OK, and then save the job.