To access HBase data sources, you must first define a connection by using the properties
in the Connection section on the Properties page. One
instance of HBase connector is always linked with one table (for example, with a single connector
instance you can read or write data to a single HBase table). But there can be many input links to a
single connector instance.
Before you begin
- From the cluster that host the HBase database to which you want to connect, copy the
core-site.xml and hbase-site.xml files to an edge node
from which it is accessible to all player nodes.
- You should also copy the target HBase client jar files which the HBase connector will use to
connect to the target database. Ensure that you always use HBase client libraries compatible with
the version of the target database.
- Define a job that contains the HBase Connector
stage
Procedure
-
On the job design canvas, double-click the HBase connector stage icon
to open the editor.
-
On the Properties page, specify values for the connection
properties.
-
The path to the core-site.xml and hbase-site.xml files is copied from the target cluster and
placed in a local directory. All the information required to connect to target HBase database (such
as zookeeper quorum, port, parent znode) will be read from those files.
-
In the HBase Namespace and Target table, specify
the table name to which you want to connect and namespace in which it is created (if different than
the default namespace).
-
In the Write mode property, you must specify which type of operation
should be performed on input data:
-
Put - creates a new row if a row does not exist already, and creates qualifiers with
values if they do not exist; if the row and qualifier exist, the qualifier is updated with the new
value, other qualifiers are not modified
-
Delete row - deletes all qualifiers (effectively removing the row) with the specified
row key
-
Delete qualifiers - deletes specified qualifiers from the target table
-
Delete row then put - all row qualifiers are deleted and new qualifiers are created with
provided values
-
Append qualifiers values - appends new values to the existing values in specified
qualifiers
-
Determine writing behavior by setting Autoflush enabled flag. If you set
it to No , modifications will be cached in a player node until the write
buffer size defined in HBase configuration file is reached. All cached data will be now sent to the
HBase database for processing. You must remember that in that case you may loose some data updates
if a player crashes for any reason (and the data will not be stored in the database). If you want to
ensure data consistency, set Autoflush enabled flag to
Yes and each processed row will be immediately sent to the database for
storing.
-
Click OK, and then save the job.