HCAT_SYNC_OBJECTS stored procedure

The HCAT_SYNC_OBJECTS stored procedure imports the current definition of Hive objects into the local database catalogs, and can also assign ownership to a user. This action makes the objects available for use within queries.

Subsequent changes to Hive tables or views are (by default) automatically synced by the HCAT_AUTOSYNC_OBJECTS procedure, and there is no need for ongoing syncing of database objects.

Note: If automatic syncing has been disabled for whatever reason, tables in the Db2® Big SQL catalog might no longer match the table definitions in the Hive metastore. This can result in I/O errors when you attempt to access the updated Hive data or the Hive catalog for table information. In particular, DROP table events that have not been synced will result in I/O errors when you attempt to access table data.

Automatic syncing of the Db2 Big SQL catalog and the Hive metastore is enabled by default. It can take a few seconds for Hive table updates to be seen in Db2 Big SQL. To disable this capability, use the bigsql-admin Db2 Big SQL cluster administration utility.Use the HCAT_SYNC_OBJECTS stored procedure to sync the Db2 Big SQL catalog and the Hive metastore manually.

Syntax

exists-action

error-action

options

Authorization

Only the bigsql user, or a user with Db2 Big SQL administrative privileges, can run this Hadoop procedure. However, the bigsql user can grant execute permission on the HCAT_SYNC_OBJECTS procedure to any user, group, or role. When using Impersonation in Db2 Big SQL it is strongly recommended to grant execute permission to the user connecting to Db2 Big SQL.

You can run the GRANT statement to grant execute permission to a specific user:

GRANT EXECUTE ON PROCEDURE SYSHADOOP.HCAT_SYNC_OBJECTS TO USER <user>

Important: If you are syncing tables that were created in Hive to Db2 Big SQL, and impersonation is not enabled, the bigsql service user requires write permission on the table directory in the HDFS, even if the intention is to use Db2 Big SQL only to perform read-only queries against the table.

Description

schema

The name of the schema that contains objects to be imported. You can use regular expressions to match multiple schemas. For schema that you previously defined in the Hive catalogs by using the CREATE TABLE (HADOOP) statement, the schema name is matched on the name as it was originally specified when the schema was created. For schema that you created outside of a CREATE TABLE (HADOOP) statement, the schema name is matched on the actual name as defined in the Hive metastore.

object-name

The name of object to be imported from within the schema. You can use regular expressions to match multiple objects. For objects that you previously defined in the Hive catalogs by using the CREATE TABLE (HADOOP) statement, the object name is matched on the name as it was originally specified when the object was created. For objects that you created outside of a CREATE TABLE (HADOOP) statement, the object name is matched on the actual name as defined in the Hive metastore.

object-types

A string of characters that indicates the types of objects to be imported. The following list contains the valid types:

t: Indicates that table objects are to be imported without associated constraints.
T: Indicates that table objects, and all the associated constraints are to be imported.
v: Indicates that view objects are to be imported.
a: Indicates that all supported objects are to be imported. This value is equivalent to specifying the string Tv. If the object-types argument is not specified, a is the default.

exists-action

Indicates the action that the process takes if an object that is being imported exists within the Db2 Big SQL catalogs. The following actions are valid:

SKIP

Indicates that objects that are already defined should be skipped. This value is the default action.

REPLACE

Indicates that the existing objects should be removed from the Db2 Big SQL catalogs and replaced with the definition that is stored in the Hive metastore.

When an object is replaced, only the metadata that resides in the Db2 Big SQL catalogs is lost. This metadata includes permissions on the object and referential constraints (if constraints are not being imported from the Hive catalogs).

The REPLACE process consists of dropping the original object and re-creating the object as defined by the Hive metastore.

Important: All objects that are created by this HCAT_SYNC_OBJECTS procedure are owned by the user that runs the procedure.

ERROR

Indicates that the presence of an existing object should be considered an error. If error-action is CONTINUE, this action is the equivalent of specifying SKIP.

MODIFY

Indicates that the procedure will attempt to modify the existing objects without removing them from the Db2 Big SQL catalog. When the HCAT_SYNC_OBJECTS procedure is invoked directly, this is the default.

There are two types of actions that can be performed: changing a column type and dropping or appending columns. The actions are implemented as ALTER HADOOP TABLE statements.

If an object cannot be modified in terms of ALTER TABLE, the procedure returns an ERROR, and the user is expected to update it in a different way, such as by using the REPLACE action.

When the MODIFY changes a column data type, there are some limitations that can result in an ERROR:

Db2 Big SQL does not support the change from original type to new type.
The type in Hive metastore is not a valid Db2 Big SQL type. For more details about valid Db2 Big SQL types, see ALTER HADOOP TABLE, Data types that are supported by Db2 Big SQL, Data types migrated from Hive applications, and Understanding data types.

Note: Several Hive types are mapped into the same Db2 Big SQL type, and in these cases no alteration of the object is needed. Therefore, the result is a SKIP.

Db2 Big SQL uses Hive comments to store information about the data type. As a consequence, modifying the Hive type of a column with comments is unsafe and leads to undefined behavior. As a general rule, if the type of a column is changed in Hive, the existing comment should be dropped.

For complex type columns it is not possible to change the column type, so an ERROR is returned and the user is expected to REPLACE the table.

There are limitations about the way columns are removed or inserted in the existing table. Changing the column name is not supported for safety reasons. A table can be safely modified if columns are either removed or appended at the end of the object definition.

The following table summarizes the most common cases:

Columns in Db2 Big SQL catalog	Columns in Hive metastore	Notes®
A B C D	A C D	Column B is removed by DROP COLUMN.
A B C D	A B C D E	Column E is appended by ADD COLUMN.
A B C D	A B C E	The alteration is performed by dropping column D and appending column E. This is not a renaming action, which is not supported. It is possible to drop the old column and add the new one because the column is at the end of the object definition, which makes it an append.
A B C D	A X B C D	Column X is inserted inside the object definition. As this operation is not supported by MODIFY, the procedure returns an ERROR. The solution is to use the REPLACE action.
A B C D	A X C D	Column B is apparently renamed to X. Since renaming is not supported, and columns cannot be added inside an object definition, the procedure returns an ERROR. The solution is to use the REPLACE action.
A B C D	B A C D	Columns A and B are swapped, so the procedure returns an ERROR. The solution is to use the REPLACE action.

error-action

Specifies the action that the process should take if there is an error during the import. The value is a string that contains one of the following actions:

STOP: Stops processing immediately and reports the nature of the error. All import activity is rolled back.
CONTINUE: Reports the error in the final results, but continues the import process.

'options'

A string that contains a comma delimited list of options:

IMPORT HDFS AUTHORIZATIONS: If specified, the HDFS authorizations on the tables are imported automatically.
See Importing HDFS authorization to Db2 Big SQL authorization for details and examples.
IMPORT HIVE AUTHORIZATIONS: If specified, any existing permissions that are set in Hive are imported to Db2 Big SQL.
See Importing Hive authorization to Db2 Big SQL authorization for details and examples.
TRANSFER OWNERSHIP TO username: Indicates that the ownership of the table is transferred to the user specified by username. That value can alternatively be HIVEOWNER, in which case, the ownership is transferred to the original HIVE owner. If you omit the TRANSFER OWNERSHIP TO clause, no transfer action is taken.

Usage notes

The procedure returns the following results:

Table 1. Results of HCAT_SYNC_OBJECTS
Column	Type	Description
OBJSCHEMA	VARCHAR(128)	The name of the schema from which to attempt to import objects.
OBJNAME	VARCHAR(128)	The name of the object that is potentially imported.
OBJATTRIB	VARCHAR(128)	For constraints, this column indicates the name of a constraint that was imported. It is NULL for all other object types.
TYPE	VARCHAR(1)	The type, which is designated by one of the following characters: T table V view C constraint
STATUS	VARCHAR(10)	The status can be one of the following values: OK The object was imported. REPLACE The object was imported as a replacement for an existing object. SKIP An object was not imported because an object with the same name already exists. SKIP_WARN An object was not imported because an object with the same name already exists. ERROR The object could not be imported. WARN_ANALYZE The table was imported into Db2 Big SQL successfully, but ANALYZE failed. WARN Warning to the user. See the DETAILS column, below.
DETAILS	VARCHAR(4000)	Contains more details about the status. If there are no more details, then the value is NULL.

After you import a table into Db2 Big SQL, you can control whether an ANALYZE command runs. To change the behavior of ANALYZE after a HCAT_SYNC_OBJECTS statement is run, modify the automatic analyze process following a HCAT_SYNC_OBJECTS statement by setting the biginsights.stats.auto.analyze.post.syncobj property to one of the following values:

DEFERRED: The default value. Hive statistics are copied after HCAT_SYNC_OBJECTS completes and an ANLYZE is queued to run later if an ANALYZE command has never been run for the specified table.
NEVER: ANALYZE is never run after the HCAT_SYNC_OBJECTS completes.
COPYHIVE: ANALYZE copies the statistics that are gathered from Hive after HCAT_SYNC_OBJECTS completes.

You can set the value as a session variable or as a system-wide property within the configurations.

Session variable

Run the following command within the Db2 Big SQL shell or interface:

SET HADOOP PROPERTY biginsights.stats.auto.analyze.post.syncobj=NEVER;

System-wide property

Update the configuration properties:

Open the bigsql-conf.xml configuration file at $BIGSQL_HOME/conf/bigsql-conf.xml on the head node only.

Add the following property:


<property>
 <name>biginsights.stats.auto.analyze.post.syncobj</name>
 <value>NEVER</value> 
</property>

Restart the Db2 Big SQL service.

The HCAT_SYNC_OBJECTS routine uses the maximum default STRING length that is defined in the bigsql.string.size property, when that property contains a value. To ensure that you do not exceed any row limits defined by your database manager, set the bigsql.string.size property to a value smaller than the current default of VARCHAR(32672) before you run the HCAT_SYNC_OBJECTS routine. The HCAT_SYNC_OBJECTS routine can estimate the string size to best fit all the columns within the row limit. The following statement is an example of setting the bigsql.string.size property before you run the routine:


SET HADOOP PROPERTY bigsql.string.size=4096;
CALL SYSHADOOP.HCAT_SYNC_OBJECTS ...

For more information about the STRING data type, see Data types that are supported by Db2 Big SQL

When you import tables with their constraints by using the T or a object-types, the matching tables are first imported, then a second pass attempts to import all constraints that are associated with those tables. A constraint that cannot be imported because of a missing reference produces a SKIP_WARN message to warn that it was not possible to import the constraint, but that the procedure was not interrupted. A constraint that cannot be imported for any other reason is considered an ERROR.

Views in the Hive catalog are generally defined in HiveQL. Views are imported only if they meet the following criteria:

The SQL in the view is fully supported by Db2 Big SQL.
All objects that the view references exist in Db2 Big SQL.
All objects that need to be delimited for use in Db2 Big SQL are properly delimited.

Any view that does not meet the criteria is an ERROR.

A successfully imported view might not have the same behavior as the original view. For example, if the view contains a Hive function that also exists in Db2 Big SQL with a different behavior, the view might not be usable.

If you are using Db2 Big SQL on IBM® Cloud Pak for Data 3.5 or later, and an HCAT_SYNC_OBJECTS procedure call results in a modification to or replacement of a table, object permissions are retained, subject to the following limitations:

A column whose name has changed will not have existing permissions reapplied.
If an option to import other authorizations is enabled, permissions that are defined in the Db2 Big SQL catalog will not be reapplied.

Examples

Import all objects within a schema. In this case, the schema name is EXAMPLES. If an object exists, replace the current object with the definition in the Hive metastore. If there is an error, report the error and continue the import operation.


CALL SYSHADOOP.HCAT_SYNC_OBJECTS('EXAMPLES', '.*', 'a', 'REPLACE', 'CONTINUE');

The following output shows that the two existing objects in the EXAMPLES schema were processed:


+-----------+------------+-----------+------+-----------+--------------------------+
|OBJSCHEMA  | OBJNAME    | OBJATTRIB | TYPE | STATUS    | DETAILS                  |
+-----------+------------+-----------+------+-----------+--------------------------+
| EXAMPLES  | HIVE_TABLE | [NULL]    | T    | SKIP_WARN | Column "C1", type        |
|           |            |           |      |           |"decfloat" is not         | 
|           |            |           |      |           | supported                |
+-----------+------------+-----------+------+-----------+--------------------------+
| EXAMPLES  | My Table   | [NULL]    | T    | OK        | [NULL]                   |
+-----------+------------+-----------+------+-----------+--------------------------+

Import objects and transfer ownership of the table to user1:


CALL SYSHADOOP.HCAT_SYNC_OBJECTS(
  'EXAMPLES', '.*', 'a', 'REPLACE', 'CONTINUE', 'TRANSFER OWNERSHIP TO user1');

Synchronize objects by attempting to modify the Big SQL table definition to match the Hive definition:
```
CALL SYSHADOOP.HCAT_SYNC_OBJECTS(
  'EXAMPLES', '.*', 'a', 'MODIFY', 'CONTINUE', 'TRANSFER OWNERSHIP TO user1');
```
It is not always possible to modify the table definition to match the Hive definition; in that case, the procedure automatically runs again, but with the REPLACE action instead. The MODIFY action is preferable to the REPLACE action because object statistics, the grants history, and so on, are retained.