Configuring the HBase connector for partitioned write

You can configure a HBase Connector stage to connect to a HBase data source and write data to a partitioned table.

Before you begin

About this task

When a HBase connector stage is configured to perform partitioned write, each of the processing nodes of the stage reads a portion of data from the data source and the records are inserted to the partitioned table based on the partition key values.

Procedure

  1. On the job design canvas, double-click HBase Connector stage, and then click the Stage tab.
  2. On the Advanced page, set Execution mode to Parallel, or Default(Parallel), and then click the Input tab.
  3. In the Write mode property,you must specify which type of operation should be performed on input data:
    1. Put - creates a new row if a row does not exist already, and creates qualifiers with values if they do not exist; if the row and qualifier exist, the qualifier is updated with the new value, other qualifiers are not modified
    2. Delete row - deletes all qualifiers (effectively removing the row) with the specified row key
    3. Delete qualifiers - deletes specified qualifiers from the target table
    4. Delete row then put - all row qualifiers are deleted and new qualifiers are created with provided values
    5. Append qualifiers values - appends new values to the existing values in specified qualifiers
  4. Determine writing behavior by setting Autoflush enabled flag. If you set it to No , modifications will be cached in a player node until the write buffer size defined in HBase configuration file is reached. All cached data will be now sent to the HBase database for processing. You must remember that in that case you may loose some data updates if a player crashes for any reason (and the data will not be stored in the database). If you want to ensure data consistency, set Autoflush enabled flag to Yes and each processed row will be immediately sent to the database for storing.
  5. Click OK, and then save the job.