Apache HBase connection

To access your data in Apache HBase, create a connection asset for it.

Apache HBase is a column-oriented nonrelational database management system that runs on top of Hadoop Distributed File System (HDFS). This connection is available for the DataStage service only.

Supported release

Apache HBase 2.4.13

Prerequisites

The Apache HBase connection requires that you install the HBase client. The connection also requires information from two XML files. If you use Kerberos authentication, you must configure Kerberos in advance.

Prerequisites for Kerberos authentication

If you plan to use Kerberos authentication, complete the following requirements:

Install the HBase client

  1. Log into Cloud Pak for Data with an oc command.

  2. Get the HBase client from https://hbase.apache.org/downloads.html. Be sure to follow verification instructions.

    wget https://dlcdn.apache.org/hbase/2.5.8/hbase-2.5.8-bin.tar.gz
    
    
  3. Extract the compressed file.

    tar -zxvf hbase-2.5.8-bin.tar.gz
    
  4. If another HBase client was previously manually added to an instance, clean that client. Skip this step if a client has not been previously added.

    #!/bin/bash
    
    # Update to the desired instance.
    INSTANCE=ds-px-default
    
    POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1)
    echo "Cleaning up HbaseClient for instance ${INSTANCE} via Pod $POD"
    
    # rm client files
    oc exec ${POD} -- rm -fr /px-runtime/HbaseClient
    
    # rm sym link
    oc exec ${POD} -- rm -fr /opt/ibm/PXService/HbaseClient
    
  5. Populate the engine with the extracted client's /lib directory's contents.

    #!/bin/bash
    
    # Update to the desired instance.
    INSTANCE=ds-px-default
    
    PODS=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1)
    POD=$(echo "${PODS}" |head -n 1)
    
    SOURCE=<absolute path to the extracted client>/hbase-2.5.8/lib
    
    oc cp ${SOURCE} ${POD}:px-runtime/HbaseClient/
    
    oc delete pod ${PODS}
    
    
  6. Restart the pods and check to make sure the folder /opt/ibm/PXService/HbaseClient contains the .jar files.

    #!/bin/bash
    # Update to the desired instance.
    INSTANCE=ds-px-default
    POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1)
    oc exec ${POD} -- ls /opt/ibm/PXService/HbaseClient |head -n 5
    

XML files requirement

Two XML files are required to connect to the Apache HBase server: hbase-site.xml and core-site.xml. These files contain the information that is required to connect to the target HBase database, such as the ZooKeeper quorum, port, and parent znode. You can use one of two methods to provide the files:

  • File content: Enter the files' values into the connection form.
  • File path: Copy the files to the Cloud Pak for Data cluster, and then enter the paths to the files in the connection form.

If you choose File path, sign in to Cloud Pak for Data as an administrator, and then use the oc cp command to copy the two files to a mounted volume where all the runtime pods can access them.

Syntax:

oc cp {SRC_XML_FILE_PATH} {POD_NAME}:{TGT_XML_FILE_PATH}

Example:

oc cp /tmp/core-site.xml my-pod-32:/tmp/core-site.xml

Save the paths to each XML file. You will enter them in the hbase-site.xml path field and the core-site.xml path field of the Create connection form.

Kerberos configuration

If you plan to use Kerberos authentication, you must copy the krb5.conf file from the Kerberos cluster to the Cloud Pak for Data cluster. In the Kerberos cluster, the krb5.conf file, is usually located in the /etc/krb5.conf path. For credentials, you have a choice of using a password or a keytab file. If you plan to use a keytab file, you must copy the keytab file from the Kerberos server to the Cloud Pak for Data cluster as well.

Sign in to Cloud Pak for Data as an administrator, and then use oc cp command to copy the files to a mounted volume where all the runtime pods can access them.

Save the paths to each file. You will enter them in the krb5.conf location field and the Keytab field of the Create connection form.

Create a connection to Apache HBase

To create the connection asset, you need these connection details:

  • Hadoop identity (optional): A unique identifier for the Hadoop cluster.

  • HBase identity (optional): A unique identifier for the HBase cluster.

  • XML file mode: Method to provide the core-site.xml file and the hbase-site.xml file. Select one of the following methods to provide the information:

    • File content: Enter the content of the core-site.xml file and the hbase-site.xml file.
    • File path: Specify the location of the core-site.xml file and the hbase-site.xml file in the nodes. For example, /px-storage/hbase/hbase-site.xml and /px-storage/hbase/core-site.xml.

For Credentials, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Authentication methods

You can authenticate with Simple authentication or with Kerberos.

  • Simple authentication user name: Simple authentication is Simple Authentication and Security Layer (SASL). Enter a username that will access Apache HBase on a cluster that is not secured.

Select Kerberos if the Apache HBase cluster is configured for Kerberos.

  • krb5.conf location: Specify the location of the krb5.conf file that is accessible on each node in the Cloud Pak for Data instance. For example, /px-storage/username/hbase/krb5.conf.

  • Principal: The user principal that is configured to access the Apache HBase server that is configured for Kerberos. The Kerberos administrator creates user principals in the Kerberos server. The user principal name has three components: Primary, Instance, and Realm. The Instance component is optional. A valid user principal name is user@example.com.

Select whether to use a password or a keytab file for Kerberos authentication.

  • Kerberos with password

    • Enter a value for the Password.
    • Select Use ticket cache if you want to use the existing ticket stored in credential cache. Provide a ticket cache location that is accessible on each node. You must run the kinit command on each node instead of copying the cache. If you do not use ticket cache, the connection will use the default location that is specified in the krb5.conf file. If the login from the cache fails, the connection will log in with the password.
  • Kerberos with keytab

    • Keytab: Provide a keytab location that is accessible on each node. For example, /px-storage/username/hbase.keytab.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Prepare data > Connect to a data source. See Adding a connection to a project.

In a deployment space
Click Import assets > Data access > Connection. See Adding data assets to a deployment space.

In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Federal Information Processing Standards (FIPS) compliance

The Apache HBase connection cannot be created in a FIPS environment.

Apache HBase setup

Quick Start - Standalone HBase

Test connection

If you use the File path input method and you want to be able to test the Apache HBase connection, copy the hbase-site.xml and core-site.xml files to the directory below /ds-storage. For example:

/ds-storage/hbase/hbase-site.xml
/ds-storage/hbase/core-site.xml

Limitations

  • If you export project assets or download a flow that includes the Apache HBase connection as a ZIP file, the hbase-site.xml and core-site.xml files will not be included. You must enter the files' values or their paths again for the connection. See Kerberos configuration prerequisite.
  • Previewing data and using the Asset browser to browse metadata do not work for the Apache HBase connection.

Learn more

Parent topic: Supported connections