Apache HBase connection

To access your data in Apache HBase, create a connection asset for it.

Apache HBase is a column-oriented nonrelational database management system that runs on top of Hadoop Distributed File System (HDFS). This connection is available for the DataStage service only.

Supported release

Apache HBase 2.4.13

Prerequisites

The Apache HBase connection requires information from two XML files. If you use Kerberos authentication, you must configure Kerberos in advance.

XML files requirement

Two XML files are required to connect to the Apache HBase server: hbase-site.xml and core-site.xml. These files contain the information that is required to connect to the target HBase database, such as the ZooKeeper quorum, port, and parent znode. You can use one of two methods to provide the files:

  • File content: Enter the files' values into the connection form.
  • File path: Copy the files to the Cloud Pak for Data cluster, and then enter the paths to the files in the connection form.

If you choose File path, sign in to Cloud Pak for Data as an administrator, and then use the oc cp command to copy the two files to a mounted volume where all the runtime pods can access them.

Syntax:

oc cp {SRC_XML_FILE_PATH} {POD_NAME}:{TGT_XML_FILE_PATH}

Example:

oc cp /tmp/core-site.xml my-pod-32:/tmp/core-site.xml

Save the paths to each XML file. You will enter them in the hbase-site.xml path field and the core-site.xml path field of the Create connection form.

Kerberos configuration

If you plan to use Kerberos authentication, you must copy the krb5.conf file from the Kerberos cluster to the Cloud Pak for Data cluster. In the Kerberos cluster, the krb5.conf file, is usually located in the /etc/krb5.conf path. For credentials, you have a choice of using a password or a keytab file. If you plan to use a keytab file, you must copy the keytab file from the Kerberos server to the Cloud Pak for Data cluster as well.

Sign in to Cloud Pak for Data as an administrator, and then use oc cp command to copy the files to a mounted volume where all the runtime pods can access them.

Save the paths to each file. You will enter them in the krb5.conf location field and the Keytab field of the Create connection form.

Create a connection to Apache HBase

To create the connection asset, you need these connection details:

  • Hadoop identity (optional): A unique identifier for the Hadoop cluster.

  • HBase identity (optional): A unique identifier for the HBase cluster.

  • XML file mode: Method to provide the core-site.xml file and the hbase-site.xml file. Select one of the following methods to provide the information:

    • File content: Enter the content of the core-site.xml file and the hbase-site.xml file.
    • File path: Specify the location of the core-site.xml file and the hbase-site.xml file in the nodes. For example, /px-storage/hbase/hbase-site.xml and /px-storage/hbase/core-site.xml.

For Credentials, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Authentication methods

You can authenticate with Simple authentication or with Kerberos.

  • Simple authentication user name: Simple authentication is Simple Authentication and Security Layer (SASL). Enter a username that will access Apache HBase on a cluster that is not secured.

Select Kerberos if the Apache HBase cluster is configured for Kerberos.

  • krb5.conf location: Specify the location of the krb5.conf file that is accessible on each node in the Cloud Pak for Data instance. For example, /px-storage/username/hbase/krb5.conf.

  • Principal: The user principal that is configured to access the Apache HBase server that is configured for Kerberos. The Kerberos administrator creates user principals in the Kerberos server. The user principal name has three components: Primary, Instance, and Realm. The Instance component is optional. A valid user principal name is user@example.com.

Select whether to use a password or a keytab file for Kerberos authentication.

  • Kerberos with password

    • Enter a value for the Password.
    • Select Use ticket cache if you want to use the existing ticket stored in credential cache. Provide a ticket cache location that is accessible on each node. You must run the kinit command on each node instead of copying the cache. If you do not use ticket cache, the connection will use the default location that is specified in the krb5.conf file. If the login from the cache fails, the connection will log in with the password.
  • Kerberos with keytab

    • Keytab: Provide a keytab location that is accessible on each node. For example, /px-storage/username/hbase.keytab.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Data access tools > Connection. See Adding a connection to a project.

In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.

In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use the Apache HBase connection in the following workspaces and tools:

Projects

Catalogs

  • Platform assets catalog

Federal Information Processing Standards (FIPS) compliance

The Apache HBase connection cannot be created in a FIPS environment.

Apache HBase setup

Quick Start - Standalone HBase

Test connection

If you use the File path input method and you want to be able to test the Apache HBase connection, copy the hbase-site.xml and core-site.xml files to the directory below /ds-storage. For example:

/ds-storage/hbase/hbase-site.xml
/ds-storage/hbase/core-site.xml

Limitations

  • If you export a project or download a flow that includes the Apache HBase connection as a ZIP file, the hbase-site.xml and core-site.xml files will not be included. You must enter the files' values or their paths again for the connection. See Kerberos configuration prerequisite.
  • Previewing data and using the Asset browser to browse metadata do not work for the Apache HBase connection.

Learn more

Parent topic: Supported connections