Enabling the automatic syncing of the Db2 Big SQL catalog and the Hive metastore

You can keep the Db2® Big SQL catalog up to date with the Hive metastore server of the cluster to which it is connected. In that way, tables that are created by Hive or other services within it are automatically imported into Db2 Big SQL and can be queried from there.

A subdirectory in HDFS must first be created. For more information, see Preparing to install Db2 Big SQL.

This task applies only when you are connecting to a remote Hadoop cluster. To enable the automatic import of Hive tables into Db2 Big SQL, complete the following steps:
  1. First, complete the steps that are described in Giving Db2 Big SQL access to the Hive warehouse directory, for the case in which Db2 Big SQL impersonation is not enabled.
  2. Log in to your OpenShift® cluster as a project administrator:
    oc login <OpenShift_URL>:<port>
  3. Change to the project where the Cloud Pak for Data control plane is installed:
    oc project ${PROJECT_CPD_INST_OPERANDS}
    Note: This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables.
  4. Identify the Db2 Big SQL instance ID:
    oc get cm -l component=db2bigsql -o custom-columns="Instance Id:{.data.instance_id},Instance Name:{.data.instance_name},Created:{.metadata.creationTimestamp}"
  5. Get the name of the Db2 Big SQL head pod:
    head_pod=$(oc get pod -l app=bigsql-<instance_id>,name=dashmpp-head-0 --no-headers=true -o=custom-columns=NAME:.metadata.name)
  6. Copy the bigsql-sync.jar file from a running Db2 Big SQL container to your remote cluster. In the following example, there is a single Hive metastore on a remote CDP cluster, on host <server>.com. If there is more than one Hive metastore, you must copy the bigsql-sync.jar file to each of the Hive metastore hosts.
    oc rsh $head_pod
    su - db2inst1
    cd /usr/ibmpacks/current/bigsql/bigsql/lib/java
    scp bigsql-sync.jar root@<server>.com:/opt/cloudera/parcels/CDH/lib/hive/lib/
    ssh root@<server>.com
    chmod 644 /opt/cloudera/parcels/CDH/lib/hive/lib/bigsql-sync.jar
  7. Configure the following properties in Cloudera Manager, under Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml:
    hive.metastore.event.listeners=com.ibm.biginsights.bigsql.sync.BIEventListener
    bigsql.catalog.sync.sleep=30
    bigsql.catalog.sync.events=/user/db2inst1/sync
  8. Restart the Hive service.
  9. Create a table in Hive, in a database other than the default (which is not synced), and confirm that it automatically synchronizes with the Db2 Big SQL service. Transactional tables are not synchronized to Db2 Big SQL.