Apache HBase connection
To access your data in Apache HBase, create a connection asset for it.
Apache HBase is a column-oriented nonrelational database management system that runs on top of Hadoop Distributed File System (HDFS). This connection is available for the DataStage service only.
Supported release
Apache HBase 2.4.13
Prerequisites
The Apache HBase connection requires information from two XML files. If you use Kerberos authentication, you must configure Kerberos in advance.
XML files requirement
Two XML files are required to connect to the Apache HBase server: hbase-site.xml
and core-site.xml
. These files contain the information that is required to connect to the target HBase database, such as the ZooKeeper
quorum, port, and parent znode. You can use one of two methods to provide the files:
- File content: Enter the files' values into the connection form.
- File path: Copy the files to the Cloud Pak for Data cluster, and then enter the paths to the files in the connection form.
If you choose File path, sign in to Cloud Pak for Data as an administrator, and then use the oc cp command to copy the two files to a mounted volume where all the runtime pods can access them.
Syntax:
oc cp {SRC_XML_FILE_PATH} {POD_NAME}:{TGT_XML_FILE_PATH}
Example:
oc cp /tmp/core-site.xml my-pod-32:/tmp/core-site.xml
Save the paths to each XML file. You will enter them in the hbase-site.xml path field and the core-site.xml path field of the Create connection form.
Kerberos configuration
If you plan to use Kerberos authentication, you must copy the krb5.conf
file from the Kerberos cluster to the Cloud Pak for Data cluster. In the Kerberos cluster, the krb5.conf
file, is usually located in the /etc/krb5.conf
path. For credentials, you have a choice of using a password or a keytab file. If you plan to use a keytab file, you must copy the keytab
file from the Kerberos server to the Cloud Pak for Data cluster as well.
Sign in to Cloud Pak for Data as an administrator, and then use oc cp command to copy the files to a mounted volume where all the runtime pods can access them.
Save the paths to each file. You will enter them in the krb5.conf location field and the Keytab field of the Create connection form.
Create a connection to Apache HBase
To create the connection asset, you need these connection details:
-
Hadoop identity (optional): A unique identifier for the Hadoop cluster.
-
HBase identity (optional): A unique identifier for the HBase cluster.
-
XML file mode: Method to provide the
core-site.xml
file and thehbase-site.xml
file. Select one of the following methods to provide the information:- File content: Enter the content of the
core-site.xml
file and thehbase-site.xml
file. - File path: Specify the location of the
core-site.xml
file and thehbase-site.xml
file in the nodes. For example,/px-storage/hbase/hbase-site.xml
and/px-storage/hbase/core-site.xml
.
- File content: Enter the content of the
For Credentials, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.
Authentication methods
You can authenticate with Simple authentication or with Kerberos.
- Simple authentication user name: Simple authentication is Simple Authentication and Security Layer (SASL). Enter a username that will access Apache HBase on a cluster that is not secured.
Select Kerberos if the Apache HBase cluster is configured for Kerberos.
-
krb5.conf location: Specify the location of the
krb5.conf
file that is accessible on each node in the Cloud Pak for Data instance. For example,/px-storage/username/hbase/krb5.conf
. -
Principal: The user principal that is configured to access the Apache HBase server that is configured for Kerberos. The Kerberos administrator creates user principals in the Kerberos server. The user principal name has three components: Primary, Instance, and Realm. The Instance component is optional. A valid user principal name is user@example.com.
Select whether to use a password or a keytab file for Kerberos authentication.
-
Kerberos with password
- Enter a value for the Password.
- Select Use ticket cache if you want to use the existing ticket stored in credential cache. Provide a ticket cache location that is accessible on each node. You must run the kinit command on each node instead
of copying the cache. If you do not use ticket cache, the connection will use the default location that is specified in the
krb5.conf
file. If the login from the cache fails, the connection will log in with the password.
-
Kerberos with keytab
- Keytab: Provide a keytab location that is accessible on each node. For example,
/px-storage/username/hbase.keytab
.
- Keytab: Provide a keytab location that is accessible on each node. For example,
Choose the method for creating a connection based on where you are in the platform
In a project
Click Assets > New asset > Data access tools > Connection. See Adding a connection to a project.
In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use the Apache HBase connection in the following workspaces and tools:
Projects
- DataStage (DataStage service). See Connecting to a data source in DataStage.
Catalogs
- Platform assets catalog
Federal Information Processing Standards (FIPS) compliance
The Apache HBase connection cannot be created in a FIPS environment.
Apache HBase setup
Test connection
If you use the File path input method and you want to be able to test the Apache HBase connection, copy the hbase-site.xml
and core-site.xml
files to the directory below /ds-storage.
For example:
/ds-storage/hbase/hbase-site.xml
/ds-storage/hbase/core-site.xml
Limitations
- If you export a project or download a flow that includes the Apache HBase connection as a ZIP file, the
hbase-site.xml
andcore-site.xml
files will not be included. You must enter the files' values or their paths again for the connection. See Kerberos configuration prerequisite. - Previewing data and using the Asset browser to browse metadata do not work for the Apache HBase connection.
Learn more
Parent topic: Supported connections