You can configure a HBase Connector stage to connect to a HBase data source and write
data to a partitioned table.
About this task
When a HBase connector stage is configured to perform partitioned write, each of the
processing nodes of the stage reads a portion of data from the data source and the records are
inserted to the partitioned table based on the partition key values.
Procedure
-
On the job design canvas, double-click HBase Connector stage, and then click the
Stage tab.
-
On the Advanced page, set Execution mode to
Parallel, or Default(Parallel), and then click the
Input tab.
-
In the Write mode property,you must specify which type of operation
should be performed on input data:
-
Put - creates a new row if a row does not exist already, and creates qualifiers with
values if they do not exist; if the row and qualifier exist, the qualifier is updated with the new
value, other qualifiers are not modified
-
Delete row - deletes all qualifiers (effectively removing the row) with the specified
row key
-
Delete qualifiers - deletes specified qualifiers from the target table
-
Delete row then put - all row qualifiers are deleted and new qualifiers are created with
provided values
-
Append qualifiers values - appends new values to the existing values in specified
qualifiers
-
Determine writing behavior by setting Autoflush enabled flag. If you set
it to No , modifications will be cached in a player node until the write
buffer size defined in HBase configuration file is reached. All cached data will be now sent to the
HBase database for processing. You must remember that in that case you may loose some data updates
if a player crashes for any reason (and the data will not be stored in the database). If you want to
ensure data consistency, set Autoflush enabled flag to
Yes and each processed row will be immediately sent to the database for
storing.
-
Click OK, and then save the job.