To access HBase data sources, you must define a connection by using the properties in the
Connection section on the Properties page. One instance of HBase connector is always linked
with one table (for example, with a single connector instance you can read or write data to a single
HBase table).
Before you begin
- From the cluster that hosts the HBase database to which you want to connect, copy the
core-site.xml and hbase-site.xml files and distribute them
to all player nodes.
- You must also copy the target HBase client jar files which the HBase connector will use to
connect to the target database. All the HBase client jars should be distributed to all player nodes.
Ensure that you always use HBase client libraries compatible with the version of the target
database.
- In case of BigIntegrate, you don’t have to copy core-site.xml,
hbase-site.xml and HBase client jars if they are available on all nodes in
uniform locations. However if you want to connect to HBase placed in a different Hadoop cluster, you
must copy all the files from the cluster and distribute them to all player nodes.
- Define a job that contains the HBase Connector
stage
Procedure
-
In the job design canvas, double-click the HBase Connector stage icon to
open the stage editor.
-
On the Properties page, specify values for the connection properties.
-
Provide Hadoop Identity. It should be a user readable name for your cluster. This property is
optional in Designer and mandatory in IMAM. It will appear as the name of HostSystem that hosts the
database after yoou import metadata in IMAM
-
Provide HBase Identity. It should be user readable name for your database. This property is
optional in Designer and mandatory in IMAM. It will appear as the name of the Database that you
import Metadata from, in IMAM.
-
The path to the core-site.xml and hbase-site.xml
files is copied from the target cluster and placed in some local directory. All the information
required to connect to target HBase database (such as zookeeper quorum, port, parent znode) will be
read from those files.
-
Specify the HBase client jars. Provide semicolon-separated list of hbase-client.jar and its
dependencies' locations. Each entry can be a directory or a single jar file. To include all child
directories, each directory is traversed recursively. The connector requires those jars to be
available under specified locations on each node it runs. This list is platform dependent.
Cloudera – use jars from
/opt/cloudera/parcels/your-active-parcel/lib/hbase/lib or alternatively download shaded
jar from Cloudera Maven repository at
https://mvnrepository.com/artifact/org.apache.hbase/hbase-client and
/opt/cloudera/parcels/your-active-parcel/lib/hadoop/client. Additionally use
jars from /opt/cloudera/parcels/your-active-parcel/lib/hadoop/client
Hortonworks – use jars in /usr/hdp/current/hbase-client/lib/ and
/usr/hdp/current/hadoop-client/
MapR - use jars from /opt/mapr/hbase/hbase-1.x.x/lib which, among others,
contains HBase client libraries. In addition to that
/opt/mapr/hadoop/hadoop-2.x/share/hadoop/common contains the required Hadoop
libraries.
-
Select Authentication method. The options are:
- None - Works only with clusters that aren't secured. Specify a user that will be used to access
HBase by specifying Simple authentication user name.
- Kerberos using password
- Specify the krb5.conf location that is accessible on each node. This file
contains Kerberos configuration information. It is found in /etc directory, if you are connecting
from outside the cluster copy such as site.xml files.
- Provide Principal in the name@REALM format.
- Provide Password.
- Set Use ticket cache to Yes if you want to use
existing ticket stored in credential cache. Provide ticket cache location
accessible on each node. You must run kinit on each node instead of copying
the cache. If you leave it empty, connector will use the default location as specified in krb5.conf.
If login from cache fails, connector will fall back to login with password.
- Kerberos using keytab
- Specify the krb5.conf location that is accessible on each node. This file
contains Kerberos configuration information. It is found in /etc directory, if you are connecting
from outside the cluster copy such as site.xml files.
- Provide Principal in the name@REALM format.
- Provide Keytab location which is accessible on each node. Keytabs can be
distributed by engine. You must copy the keytab to any location on edge node and set job environment
variable APT_YARN_HBASE_DEFAULT_KEYTAB_PATH to this exact location. Ensure that
you leave Keytab property empty since it takes priority over the variable. Such variables have job
scope so it's going to be the same for every HBase stage used in job and therefore keytabs must be
merged using ktutil tool. If this can't be done for any reason, you can
always distribute the keytabs under separate paths.
-
In the HBase Namespace and Target table, specify the
table name to which you want to connect and namespace in which it is created (if different than the
default namespace).
Note: In MapR, only namespace default can be used.
-
Click OK to save.