Configuring the HBase connector for writing data

To access HBase data sources, you must first define a connection by using the properties in the Connection section on the Properties page. One instance of HBase connector is always linked with one table (for example, with a single connector instance you can read or write data to a single HBase table). But there can be many input links to a single connector instance.

Before you begin

  • From the cluster that host the HBase database to which you want to connect, copy the core-site.xml and hbase-site.xml files to an edge node from which it is accessible to all player nodes.
  • You should also copy the target HBase client jar files which the HBase connector will use to connect to the target database. Ensure that you always use HBase client libraries compatible with the version of the target database.
  • Define a job that contains the HBase Connector stage

Procedure

  1. On the job design canvas, double-click the HBase connector stage icon to open the editor.
  2. On the Properties page, specify values for the connection properties.
  3. The path to the core-site.xml and hbase-site.xml files is copied from the target cluster and placed in a local directory. All the information required to connect to target HBase database (such as zookeeper quorum, port, parent znode) will be read from those files.
  4. In the HBase Namespace and Target table, specify the table name to which you want to connect and namespace in which it is created (if different than the default namespace).
  5. In the Write mode property, you must specify which type of operation should be performed on input data:
    1. Put - creates a new row if a row does not exist already, and creates qualifiers with values if they do not exist; if the row and qualifier exist, the qualifier is updated with the new value, other qualifiers are not modified
    2. Delete row - deletes all qualifiers (effectively removing the row) with the specified row key
    3. Delete qualifiers - deletes specified qualifiers from the target table
    4. Delete row then put - all row qualifiers are deleted and new qualifiers are created with provided values
    5. Append qualifiers values - appends new values to the existing values in specified qualifiers
  6. Determine writing behavior by setting Autoflush enabled flag. If you set it to No , modifications will be cached in a player node until the write buffer size defined in HBase configuration file is reached. All cached data will be now sent to the HBase database for processing. You must remember that in that case you may loose some data updates if a player crashes for any reason (and the data will not be stored in the database). If you want to ensure data consistency, set Autoflush enabled flag to Yes and each processed row will be immediately sent to the database for storing.
  7. Click OK, and then save the job.