Automatic Hive catalog syncing to the Db2 Big SQL catalog

Tables that are created, altered, or dropped by Hive clients can have the associated catalog changes automatically applied to Db2® Big SQL.

Enabling automatic syncing

Automatic syncing of the Db2 Big SQL catalog and the Hive metastore is enabled by default. If you have disabled automatic syncing (see Disabling automatic syncing), you can enable this feature again . You can use the bigsql-admin Db2 Big SQL cluster administration utility to enable automatic syncing of the Db2 Big SQL catalog and the Hive metastore.

When you enable this feature, you also create a scheduled event for the following synchronization call:
db2 "CALL SYSPROC.ADMIN_TASK_ADD ('Synchronise MetaData Changes from Hive',
  NULL, NULL, NULL, NULL, 'SYSHADOOP', 'HCAT_AUTOSYNC_OBJECTS', NULL, NULL, NULL)
You can invoke the HCAT_SYNC_OBJECTS stored procedure to add all current Hive tables from all schemas to the Db2 Big SQL catalog:
db2 "CALL SYSHADOOP.HCAT_SYNC_OBJECTS('.*', '.*', 'a', 'REPLACE', 'CONTINUE', 'TRANSFER OWNERSHIP TO HIVEOWNER')"
Important:
  • Non-transactional bucketed tables are not supported in Db2 Big SQL and therefore are not synchronized.
  • Tables that are created under the Hive default schema are not automatically synced; you must synchronize these tables manually if you want them in Db2 Big SQL.

Monitoring the synchronization

The bigsql.log file receives log information directly from the HCAT_AUTOSYNC_OBJECTS process.

Automatic syncing uses a scheduled DB2® administrative task scheduler (ATS) sync procedure. The description of the sync procedure is Synchronise MetaData Changes from Hive. You can monitor this procedure like any other ATS scheduled procedures. For information about ATS and how to monitor such procedures, see Administrative task scheduler.

By default, event files are placed in the HDFS under /user/bigsql/sync. The file pattern for those files is fixed, and only files that match this pattern are processed. The general file pattern is DDL-timestamp-UUID.json. For example: DDL-[0-9]{17}-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.json. If files are left in this directory for some time, it is good practice to investigate. If you disable automatic syncing, files collect in this directory but are not processed. You must determine whether or not such files should be retained.

Disabling automatic syncing

Disabling automatic syncing removes the scheduled process. Events are still gathered and placed in the sync directory, but are never processed. The following example shows how to disable automatic syncing:
db2 "CALL SYSPROC.ADMIN_TASK_REMOVE ('Synchronise MetaData Changes from Hive',NULL)"

Synchronization errors

In the event of a synchronization failure, the associated event-file is moved from the events-directory (by default: /biginsights/bigsql/sync) to an "errors" sub-directory (by default: /biginsights/bigsql/sync/errors). It is good practice to monitor this directory for any synchronization errors and address any issues where necessary.

When any issues have been addressed, and where synchronization for the associated object(s) is still required, the relevant event-files can then be moved back into the events-directory (by default: /biginsights/bigsql/sync) where automatic synchronization will take place.