EXT_METASTORE_SYNC stored procedure
The EXT_METASTORE_SYNC stored procedure imports Hive and Iceberg tables from watsonx.data into Db2. The table is then accessible from both systems.
The output of this procedure is two result sets. The first result set indicates if the invocation of the procedure was successful or not and returns a status code and a status message. If the invocation of the procedure is successful, a second result set is returned which has a row for each object imported from the external metastore and a status with details of the operation performed. The STATUS column indicates whether the table was synced or not and the DETAILS columns will provide additional information.
Authorization
EXECUTE is granted to the DASHDB_ENTERPRISE_ADMIN role.
Syntax
The schema is SYSHADOOP.
Description
-
metastore_name
- An input argument of type VARCHAR(256) containing the name under which the metastore was registered. This is a required parameter which references an external metastore which has been registered using the SYSHADOOP.REGISTER_EXT_METASTORE procedure. If the value does not match a registered external metastore, an error is returned. schema_name
- An input argument of type VARCHAR(128) containing the name of the schema as stored in the Db2 catalog that contains the objects to be imported. You can use regular expressions to match multiple schemas. To specify all schemas use the following pattern: '.*'. Passing in a NULL value or an empty string will return an error. object_name
- An input argument of type VARCHAR(128) containing the name of the object as stored in the Db2 catalog to import. It is possible to use a regular expression to import multiple tables in one invocation. To specify all tables use the pattern '.*'. Passing in a NULL value or an empty string will return an error. object_types
- A string of characters that indicates the types of objects to be imported. The following list
contains the valid types:
- a
Indicates that all supported objects are to be imported, which is currently limited to just tables, and is equivalent to t. If the object-types argument is not specified, a is the default.
- t
Indicates that table objects are to be imported without associated constraints.
exist_action
- a
- An input argument of type VARCHAR(20) containing the action to take if a table with the same
name already exists in the schema in the Db2 catalog. If the table does not yet exist in the Db2
catalog, this option is ignored and an attempt will be made to create the table is created in Db2.
If a NULL value is used, the default exist_action is used. Passing an empty string value will result
in an error. Supported values are:
- ERROR
- Indicates that the presence of an existing object should be considered an error. This is the default. If error-action is CONTINUE, this action is the equivalent of specifying SKIP.
- MODIFY
- Indicates that the procedure will attempt to modify the existing object without removing the object from the Db2 catalog. If an object cannot be modified using ALTER DATALAKE TABLE, the procedure returns an ERROR, and the user is expected to update it in a different way, such as by using the MODIFY_REPLACE or REPLACE action.
- MODIFY_REPLACE
- Indicates that the procedure will attempt to modify the existing object without removing the object from the Db2 catalog. If an object cannot be modified using ALTER DATALAKE TABLE, the import will be done using the REPLACE behaviour. Note: This is a very disruptive operation. See, Dropping Tables.
- REPLACE
- Indicates that the existing objects should be removed from the Db2 Big SQL catalogs and replaced. The existing table is DROPPED and replaced with the table being imported. NOTE: This is a very disruptive operation. This is a very disruptive operation. See, Dropping Tables.
- SKIP
- Indicates that objects that are already defined should be skipped.
error_action
- An input argument of type VARCHAR(20) containing the action to take if a table with the same
name already exists in the schema in the Db2 catalog. If the table does not yet exist in the Db2
catalog, this option is ignored and an attempt will be made to create the table is created in Db2.
If a NULL value is used, the default exist_action is used. Passing an empty string value will result
in an error. Supported values are:
- STOP
- The processing stops and no further tables are synchronised. This is the default value.
- CONTINUE
- The error is logged, but processing continues if multiple tables are to be imported.
options
- An input parameter of type VARCHAR(256) that is reserved for future use. The default value is NULL
Usage notes
- Regular expressions can be used to specify the schema_name and object_name parameters. Since the Hive metastore (HMS) and the metastore used in watsonx.data only currently support lowercase object names, the regular expressions are treated as case insensitive. For more information on Datalake table identifier handling see, Identifier Handling.
- Java style regular expressions are used for the schema_name and object_name parameters not SQL style LIKE expressions.
- When the exists_action is set to REPLACE, the table will be dropped and recreated in Db2. When a table is dropped, the row in the SYSCAT.TABLES system catalog view that contains information about that table is dropped, and any other objects that depend on the table are affected. See Dropping Tables, for more implications on dropping the table in Db2.
- It is recommended to run HCAT_CACHE_SYNC after running EXT_METASTORE_SYNC procedure to ensure that the Big SQL scheduler cache is refreshed when there are changes to the tables. For more information about the scheduler cache, see Db2 Big SQL Scheduler.
- It is recommended to run the MSCK REPAIR TABLE statement after importing a table from watsonx.data to ensure partitioning information is properly populated into the local HMS.
Examples
- The external metastores have been defined as lower case using the REGISTER_EXT_METASTORE stored procedure.
- The schema and table names used have been defined as upper case in Db2, so upper case will be used in the examples.
Example 1
CALL SYSHADOOP.EXT_METASTORE_SYNC('', null , null)Result set:
-------------- STATUS_CODE STATUS_MESSAGE ----------- ------------------------------------------------------------------------------ -1 metastore_name argument '' must specify a previously registered metastore name 1 record(s) selected. Return Status = 0This allows the user to check the result set and STATUS_CODE for a non-zero value indicating an error has occurred with the invocation of the procedure.
Example 2
Synchronize tables existing in a schema named ICEBERG_IMPORT_MODIFY in a watsonx.data metastore registered in Db2 as watsonxdev and which do not currently exist locally.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*')Result
set 1: STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
Result set
2:OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------------------------------------------------------------------------------------- ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T OK Synced from external metastore, Local sync ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_2 T OK Synced from external metastore, Local sync 2 record(s) selected. Return Status = 0
Example 3
Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name is ICEBERG_IMPORT_MODIFY. If any table exists then stop at that point with an error.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*', exists_action => 'ERROR')
STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------- ---------------------- --------- ---- ------ ------------------------------ ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T ERROR External Object already Exists 1 record(s) selected. Return Status = 0
Example 4
Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name name is ICEBERG_IMPORT_MODIFY. In this example, the optional object_types parameter is omitted, and named parameters are used to specify exists_action and error_action parameters. Since the exists_action of ERROR is used, if any table already exists, then an error will be returned. Since the error_action of CONTINUE is used, even if an error is returned for some tables, the processing will continue for all the tables.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*', exists_action => 'ERROR', error_action => 'CONTINUE')
STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------------------------------------------------------------------------------------- ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T OK Synced from external metastore, Object exists ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_2 T OK Synced from external metastore, Object exists 2 record(s) selected. Return Status = 0
Example 5
Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression ‘ICEBER[G123]+_IMPORT_MODIFY’. If any table already exists then it is skipped since the exists_action is set to SKIP, but the processing will continue with the rest of the tables.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBER[G123]+_IMPORT_MODIFY', '.*', exists_action => 'SKIP')
STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------------------------------------------------------------------------------------- ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T OK Synced from external metastore, Object exists ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_2 T OK Synced from external metastore, Object exists 2 record(s) selected. Return Status = 0
Example 6
Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression 'I.*_MODIFY.*' and whose table names match the expression ‘.*MODIFY.*’. In this example, if any table already exists, it will be replaced since the exists_action is set to REPLACE.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'I.*_MODIFY.*', '.*MODIFY.*', exists_action => 'REPLACE')
STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------------------------------------------------------------------------------------- ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T OK Synced from external metastore, Local sync ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_2 T OK Synced from external metastore, Local sync 2 record(s) selected. Return Status = 0
Example 7
Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression ‘I.*_MODIFY.*' and whose table names match the expression ‘.*[12]’. In this example, if any table already exists, an attempt will be made to modify the table in the Db2 catalog since the exists_action is set to MODIFY. If the table cannot be successfully modified, an error will be returned for this table.
CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'I.*_MODIFY.*', '.*[12]', object_types => 't', exists_action => 'MODIFY')
STATUS_CODE STATUS_MESSAGE
----------- ---------------------------
0 Sync Procedure completed OK
1 record(s) selected.
OBJSCHEMA OBJNAME OBJATTRIB TYPE STATUS DETAILS --------------------------------------------------------------------------------------------------- ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_1 T OK Synced from external metastore, No action needed ICEBERG_IMPORT_MODIFY ICEBERG_MODIFY_TABLE_2 T OK Synced from external metastore, No action needed 2 record(s) selected. Return Status = 0
