HCAT_SYNC_OBJECTS stored procedure
The HCAT_SYNC_OBJECTS stored procedure imports the current definition of Hive objects into the local database catalogs, and can also assign ownership to a user. This action makes the objects available for use within queries.
Automatic syncing of the Db2 Big SQL catalog and the Hive metastore is enabled by default. It can take a few seconds for Hive table updates to be seen in Db2 Big SQL. To disable this capability, use the bigsql-admin Db2 Big SQL cluster administration utility.Use the HCAT_SYNC_OBJECTS stored procedure to sync the Db2 Big SQL catalog and the Hive metastore manually.
Only the bigsql user, or a user with Db2 Big SQL administrative privileges, can run this Hadoop procedure. However, the bigsql user can grant execute permission on the HCAT_SYNC_OBJECTS procedure to any user, group, or role. When using Impersonation in Db2 Big SQL it is strongly recommended to grant execute permission to the user connecting to Db2 Big SQL.
GRANT EXECUTE ON PROCEDURE SYSHADOOP.HCAT_SYNC_OBJECTS TO USER <user>
- The name of the schema that contains objects to be imported. You can use regular expressions to match multiple schemas. For schema that you previously defined in the Hive catalogs by using the CREATE TABLE (HADOOP) statement, the schema name is matched on the name as it was originally specified when the schema was created. For schema that you created outside of a CREATE TABLE (HADOOP) statement, the schema name is matched on the actual name as defined in the Hive metastore.
- The name of object to be imported from within the schema. You can use regular expressions to match multiple objects. For objects that you previously defined in the Hive catalogs by using the CREATE TABLE (HADOOP) statement, the object name is matched on the name as it was originally specified when the object was created. For objects that you created outside of a CREATE TABLE (HADOOP) statement, the object name is matched on the actual name as defined in the Hive metastore.
- A string of characters that indicates the types of objects to be imported.
The following list contains the valid types:
- Indicates that table objects are to be imported without associated constraints.
- Indicates that table objects, and all the associated constraints are to be imported.
- Indicates that view objects are to be imported.
- Indicates that all supported objects are to be imported. This value is equivalent to specifying the string Tv. If the object-types argument is not specified, a is the default.
- Indicates the action that the process takes if an object that is being imported exists within the Db2 Big SQL catalogs. The following actions are valid:
- Indicates that objects that are already defined should be skipped. This value is the default action.
- Indicates that the existing objects should be removed from the Db2 Big SQL catalogs and replaced with the definition that is stored in the Hive
When an object is replaced, only the metadata that resides in the Db2 Big SQL catalogs is lost. This metadata includes permissions on the object and referential constraints (if constraints are not being imported from the Hive catalogs).The REPLACE process consists of dropping the original object and re-creating the object as defined by the Hive metastore.Important: All objects that are created by this HCAT_SYNC_OBJECTS procedure are owned by the user that runs the procedure.
- Indicates that the presence of an existing object should be considered an error. If error-action is CONTINUE, this action is the equivalent of specifying SKIP.
Indicates that the procedure will attempt to modify the existing objects without removing them from the Db2 Big SQL catalog. When the HCAT_SYNC_OBJECTS procedure is invoked directly, this is the default.
There are two types of actions that can be performed: changing a column type and dropping or appending columns. The actions are implemented as ALTER HADOOP TABLE statements.
If an object cannot be modified in terms of ALTER TABLE, the procedure returns an ERROR, and the user is expected to update it in a different way, such as by using the REPLACE action.When the MODIFY changes a column data type, there are some limitations that can result in an ERROR:
Note: Several Hive types are mapped into the same Db2 Big SQL type, and in these cases no alteration of the object is needed. Therefore, the result is a SKIP.
- Db2 Big SQL does not support the change from original type to new type.
- The type in Hive metastore is not a valid Db2 Big SQL type. For more details about valid Db2 Big SQL types, see ALTER HADOOP TABLE, Data types that are supported by Db2 Big SQL, Data types migrated from Hive applications, and Understanding data types.
Db2 Big SQL uses Hive comments to store information about the data type. As a consequence, modifying the Hive type of a column with comments is unsafe and leads to undefined behavior. As a general rule, if the type of a column is changed in Hive, the existing comment should be dropped.
For complex type columns it is not possible to change the column type, so an ERROR is returned and the user is expected to REPLACE the table.
There are limitations about the way columns are removed or inserted in the existing table. Changing the column name is not supported for safety reasons. A table can be safely modified if columns are either removed or appended at the end of the object definition.The following table summarizes the most common cases:
Columns in Db2 Big SQL catalog Columns in Hive metastore Notes®A B C DA C D Column B is removed by DROP COLUMN.A B C DA B C D E Column E is appended by ADD COLUMN.A B C DA B C E The alteration is performed by dropping column D and appending column E. This is not a renaming action, which is not supported. It is possible to drop the old column and add the new one because the column is at the end of the object definition, which makes it an append.A B C DA X B C D Column X is inserted inside the object definition. As this operation is not supported by MODIFY, the procedure returns an ERROR. The solution is to use the REPLACE action.A B C DA X C D Column B is apparently renamed to X. Since renaming is not supported, and columns cannot be added inside an object definition, the procedure returns an ERROR. The solution is to use the REPLACE action.A B C DB A C D Columns A and B are swapped, so the procedure returns an ERROR. The solution is to use the REPLACE action.
- Specifies the action that the process should take if there is an error
during the import. The value is a string that contains one of the following actions:
- Stops processing immediately and reports the nature of the error. All import activity is rolled back.
- Reports the error in the final results, but continues the import process.
- A string that contains a comma delimited list of options:
- IMPORT HDFS AUTHORIZATIONS
- If specified, the HDFS authorizations on the tables are imported
See Importing HDFS authorization to Db2 Big SQL authorization for details and examples.
- IMPORT HIVE AUTHORIZATIONS
- If specified, any existing permissions that are set in Hive are imported to Db2 Big SQL.
See Importing Hive authorization to Db2 Big SQL authorization for details and examples.
- TRANSFER OWNERSHIP TO username
- Indicates that the ownership of the table is transferred to the user specified by username. That value can alternatively be HIVEOWNER, in which case, the ownership is transferred to the original HIVE owner. If you omit the TRANSFER OWNERSHIP TO clause, no transfer action is taken.
|OBJSCHEMA||VARCHAR(128)||The name of the schema from which to attempt to import objects.|
|OBJNAME||VARCHAR(128)||The name of the object that is potentially imported.|
|OBJATTRIB||VARCHAR(128)||For constraints, this column indicates the name of a constraint that was imported. It is NULL for all other object types.|
|TYPE||VARCHAR(1)||The type, which is designated by one of the following characters:
|STATUS||VARCHAR(10)||The status can be one of the following values:
|DETAILS||VARCHAR(4000)||Contains more details about the status. If there are no more details, then the value is NULL.|
- The default value. Hive statistics are copied after HCAT_SYNC_OBJECTS completes and an ANLYZE is queued to run later if an ANALYZE command has never been run for the specified table.
- ANALYZE is never run after the HCAT_SYNC_OBJECTS completes.
- ANALYZE copies the statistics that are gathered from Hive after HCAT_SYNC_OBJECTS completes.
- Session variable
- Run the following command within the Db2 Big SQL shell or interface:
SET HADOOP PROPERTY biginsights.stats.auto.analyze.post.syncobj=NEVER;
- System-wide property
- Update the configuration properties:
- Open the bigsql-conf.xml configuration file at $BIGSQL_HOME/conf/bigsql-conf.xml on the head node only.
- Add the following property:
<property> <name>biginsights.stats.auto.analyze.post.syncobj</name> <value>NEVER</value> </property>
- Restart the Db2 Big SQL service.
For more information about the STRING data type, see Data types that are supported by Db2 Big SQL
SET HADOOP PROPERTY bigsql.string.size=4096; CALL SYSHADOOP.HCAT_SYNC_OBJECTS ...
When you import tables with their constraints by using the T or a object-types, the matching tables are first imported, then a second pass attempts to import all constraints that are associated with those tables. A constraint that cannot be imported because of a missing reference produces a SKIP_WARN message to warn that it was not possible to import the constraint, but that the procedure was not interrupted. A constraint that cannot be imported for any other reason is considered an ERROR.
- The SQL in the view is fully supported by Db2 Big SQL.
- All objects that the view references exist in Db2 Big SQL.
- All objects that need to be delimited for use in Db2 Big SQL are properly delimited.
A successfully imported view might not have the same behavior as the original view. For example, if the view contains a Hive function that also exists in Db2 Big SQL with a different behavior, the view might not be usable.
- A column whose name has changed will not have existing permissions reapplied.
- If an option to import other authorizations is enabled, permissions that are defined in the Db2 Big SQL catalog will not be reapplied.
- Import all objects within a schema. In this case, the schema name is EXAMPLES. If an object
exists, replace the current object with the definition in the Hive metastore. If there is an error,
report the error and continue the import
The following output shows that the two existing objects in the EXAMPLES schema were processed:
CALL SYSHADOOP.HCAT_SYNC_OBJECTS('EXAMPLES', '.*', 'a', 'REPLACE', 'CONTINUE');
+-----------+------------+-----------+------+-----------+--------------------------+ |OBJSCHEMA | OBJNAME | OBJATTRIB | TYPE | STATUS | DETAILS | +-----------+------------+-----------+------+-----------+--------------------------+ | EXAMPLES | HIVE_TABLE | [NULL] | T | SKIP_WARN | Column "C1", type | | | | | | |"decfloat" is not | | | | | | | supported | +-----------+------------+-----------+------+-----------+--------------------------+ | EXAMPLES | My Table | [NULL] | T | OK | [NULL] | +-----------+------------+-----------+------+-----------+--------------------------+
- Import objects and transfer ownership of the table to
CALL SYSHADOOP.HCAT_SYNC_OBJECTS( 'EXAMPLES', '.*', 'a', 'REPLACE', 'CONTINUE', 'TRANSFER OWNERSHIP TO user1');
- Synchronize objects by attempting to modify the Big SQL table definition to match the Hive
It is not always possible to modify the table definition to match the Hive definition; in that case, the procedure automatically runs again, but with the REPLACE action instead. The MODIFY action is preferable to the REPLACE action because object statistics, the grants history, and so on, are retained.
CALL SYSHADOOP.HCAT_SYNC_OBJECTS( 'EXAMPLES', '.*', 'a', 'MODIFY', 'CONTINUE', 'TRANSFER OWNERSHIP TO user1');