EXT_METASTORE_SYNC stored procedure

The EXT_METASTORE_SYNC stored procedure imports Hive and Iceberg tables from watsonx.data into Db2. The table is then accessible from both systems.

The output of this procedure is two result sets. The first result set indicates if the invocation of the procedure was successful or not and returns a status code and a status message. If the invocation of the procedure is successful, a second result set is returned which has a row for each object imported from the external metastore and a status with details of the operation performed. The STATUS column indicates whether the table was synced or not and the DETAILS columns will provide additional information.

Authorization

EXECUTE is granted to the DASHDB_ENTERPRISE_ADMIN role.

Syntax

Read syntax diagramSkip visual syntax diagramEXT_METASTORE_SYNC(metastore_name ,schema_name,object_name,object_types ,exists_action,error_action,options)
object_types
Read syntax diagramSkip visual syntax diagram'a''t'
exists_action
Read syntax diagramSkip visual syntax diagram'ERROR''MODIFY''MODIFY_REPLACE''REPLACE''SKIP'
error_action
Read syntax diagramSkip visual syntax diagram'CONTINUE''STOP'

The schema is SYSHADOOP.

Description

metastore_name
An input argument of type VARCHAR(256) containing the name under which the metastore was registered. This is a required parameter which references an external metastore which has been registered using the SYSHADOOP.REGISTER_EXT_METASTORE procedure. If the value does not match a registered external metastore, an error is returned.
schema_name
An input argument of type VARCHAR(128) containing the name of the schema as stored in the Db2 catalog that contains the objects to be imported. You can use regular expressions to match multiple schemas. To specify all schemas use the following pattern: '.*'. Passing in a NULL value or an empty string will return an error.
object_name
An input argument of type VARCHAR(128) containing the name of the object as stored in the Db2 catalog to import. It is possible to use a regular expression to import multiple tables in one invocation. To specify all tables use the pattern '.*'. Passing in a NULL value or an empty string will return an error.
object_types
A string of characters that indicates the types of objects to be imported. The following list contains the valid types:
  • a

    Indicates that all supported objects are to be imported, which is currently limited to just tables, and is equivalent to t. If the object-types argument is not specified, a is the default.

  • t

    Indicates that table objects are to be imported without associated constraints.

exist_action
An input argument of type VARCHAR(20) containing the action to take if a table with the same name already exists in the schema in the Db2 catalog. If the table does not yet exist in the Db2 catalog, this option is ignored and an attempt will be made to create the table is created in Db2. If a NULL value is used, the default exist_action is used. Passing an empty string value will result in an error. Supported values are:
ERROR
Indicates that the presence of an existing object should be considered an error. This is the default. If error-action is CONTINUE, this action is the equivalent of specifying SKIP.
MODIFY
Indicates that the procedure will attempt to modify the existing object without removing the object from the Db2 catalog. If an object cannot be modified using ALTER DATALAKE TABLE, the procedure returns an ERROR, and the user is expected to update it in a different way, such as by using the MODIFY_REPLACE or REPLACE action.
MODIFY_REPLACE
Indicates that the procedure will attempt to modify the existing object without removing the object from the Db2 catalog. If an object cannot be modified using ALTER DATALAKE TABLE, the import will be done using the REPLACE behaviour. Note: This is a very disruptive operation. See, Dropping Tables.
REPLACE
Indicates that the existing objects should be removed from the Db2 Big SQL catalogs and replaced. The existing table is DROPPED and replaced with the table being imported. NOTE: This is a very disruptive operation. This is a very disruptive operation. See, Dropping Tables.
SKIP
Indicates that objects that are already defined should be skipped.
error_action
An input argument of type VARCHAR(20) containing the action to take if a table with the same name already exists in the schema in the Db2 catalog. If the table does not yet exist in the Db2 catalog, this option is ignored and an attempt will be made to create the table is created in Db2. If a NULL value is used, the default exist_action is used. Passing an empty string value will result in an error. Supported values are:
STOP
The processing stops and no further tables are synchronised. This is the default value.
CONTINUE
The error is logged, but processing continues if multiple tables are to be imported.
options
An input parameter of type VARCHAR(256) that is reserved for future use. The default value is NULL

Usage notes

  • Regular expressions can be used to specify the schema_name and object_name parameters. Since the Hive metastore (HMS) and the metastore used in watsonx.data only currently support lowercase object names, the regular expressions are treated as case insensitive. For more information on Datalake table identifier handling see, Identifier Handling.
  • Java style regular expressions are used for the schema_name and object_name parameters not SQL style LIKE expressions.
  • When the exists_action is set to REPLACE, the table will be dropped and recreated in Db2. When a table is dropped, the row in the SYSCAT.TABLES system catalog view that contains information about that table is dropped, and any other objects that depend on the table are affected. See Dropping Tables, for more implications on dropping the table in Db2.
  • It is recommended to run HCAT_CACHE_SYNC after running EXT_METASTORE_SYNC procedure to ensure that the Big SQL scheduler cache is refreshed when there are changes to the tables. For more information about the scheduler cache, see Db2 Big SQL Scheduler.
  • It is recommended to run the MSCK REPAIR TABLE statement after importing a table from watsonx.data to ensure partitioning information is properly populated into the local HMS.

Examples

Note:
The following assumptions apply to these examples:
  • The external metastores have been defined as lower case using the REGISTER_EXT_METASTORE stored procedure.
  • The schema and table names used have been defined as upper case in Db2, so upper case will be used in the examples.

Example 1

Attempt to synchronize a table, but with invalid parameters.
CALL SYSHADOOP.EXT_METASTORE_SYNC('', null , null)
Result set:
--------------

  STATUS_CODE STATUS_MESSAGE

  ----------- ------------------------------------------------------------------------------

  -1          metastore_name argument '' must specify a previously registered metastore name

  1 record(s) selected.

  Return Status = 0
This allows the user to check the result set and STATUS_CODE for a non-zero value indicating an error has occurred with the invocation of the procedure.

Example 2

Synchronize tables existing in a schema named ICEBERG_IMPORT_MODIFY in a watsonx.data metastore registered in Db2 as watsonxdev and which do not currently exist locally.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*')
Result set 1:
STATUS_CODE    STATUS_MESSAGE

  ----------- ---------------------------

         0              Sync Procedure completed OK

  1 record(s) selected.

Result set 2:
  OBJSCHEMA                     OBJNAME              OBJATTRIB TYPE    STATUS   DETAILS

  ---------------------------------------------------------------------------------------------------

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_1     T                 OK      Synced from external metastore, Local sync

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_2     T                 OK      Synced from external metastore, Local sync

  2 record(s) selected.

  Return Status = 0

Example 3

Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name is ICEBERG_IMPORT_MODIFY. If any table exists then stop at that point with an error.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*', exists_action => 'ERROR')
Result set 1:
STATUS_CODE    STATUS_MESSAGE

  ----------- ---------------------------

         0              Sync Procedure completed OK

  1 record(s) selected.
Result set 2:
   OBJSCHEMA                 OBJNAME                            OBJATTRIB   TYPE      STATUS DETAILS

  --------------------- ---------------------- --------- ---- ------ ------------------------------

  ICEBERG_IMPORT_MODIFY        ICEBERG_MODIFY_TABLE_1          T          ERROR     External Object already Exists

  1 record(s) selected.

  Return Status = 0

Example 4

Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name name is ICEBERG_IMPORT_MODIFY. In this example, the optional object_types parameter is omitted, and named parameters are used to specify exists_action and error_action parameters. Since the exists_action of ERROR is used, if any table already exists, then an error will be returned. Since the error_action of CONTINUE is used, even if an error is returned for some tables, the processing will continue for all the tables.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBERG_IMPORT_MODIFY', '.*', exists_action => 'ERROR', error_action => 'CONTINUE')
Result set 1:
STATUS_CODE   STATUS_MESSAGE

  ----------- ---------------------------

         0          Sync Procedure completed OK

  1 record(s) selected.
Result set 2:
  OBJSCHEMA                     OBJNAME              OBJATTRIB TYPE    STATUS   DETAILS

  ---------------------------------------------------------------------------------------------------

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_1     T                 OK      Synced from external metastore, Object exists

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_2     T                 OK      Synced from external metastore, Object exists

  2 record(s) selected.

  Return Status = 0

Example 5

Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression ‘ICEBER[G123]+_IMPORT_MODIFY’. If any table already exists then it is skipped since the exists_action is set to SKIP, but the processing will continue with the rest of the tables.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'ICEBER[G123]+_IMPORT_MODIFY', '.*', exists_action => 'SKIP')
Result set 1:
STATUS_CODE STATUS_MESSAGE

  ----------- ---------------------------

            0 Sync Procedure completed OK

  1 record(s) selected.

Result set 2:
  OBJSCHEMA                     OBJNAME              OBJATTRIB TYPE    STATUS   DETAILS

  ---------------------------------------------------------------------------------------------------

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_1     T                 OK      Synced from external metastore, Object exists

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_2     T                 OK      Synced from external metastore, Object exists

  2 record(s) selected.

  Return Status = 0

Example 6

Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression 'I.*_MODIFY.*' and whose table names match the expression ‘.*MODIFY.*’. In this example, if any table already exists, it will be replaced since the exists_action is set to REPLACE.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'I.*_MODIFY.*', '.*MODIFY.*', exists_action => 'REPLACE')

Result set 1:
 STATUS_CODE         STATUS_MESSAGE

  ----------- ---------------------------

            0                   Sync Procedure completed OK

  1 record(s) selected.
Result set 2:
  OBJSCHEMA                     OBJNAME              OBJATTRIB TYPE    STATUS   DETAILS

  ---------------------------------------------------------------------------------------------------

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_1     T                 OK      Synced from external metastore, Local sync

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_2     T                 OK      Synced from external metastore, Local sync

  2 record(s) selected.

  Return Status = 0

Example 7

Synchronize all tables in the external metastore registered in Db2 as watsonxdev which exist in a schema whose name would match the regular expression ‘I.*_MODIFY.*' and whose table names match the expression ‘.*[12]’. In this example, if any table already exists, an attempt will be made to modify the table in the Db2 catalog since the exists_action is set to MODIFY. If the table cannot be successfully modified, an error will be returned for this table.

CALL SYSHADOOP.EXT_METASTORE_SYNC('watsonxdev', 'I.*_MODIFY.*', '.*[12]', object_types => 't', exists_action => 'MODIFY')
Result set 1:
 STATUS_CODE         STATUS_MESSAGE

  ----------- ---------------------------

            0                   Sync Procedure completed OK

  1 record(s) selected.
Result set 2:
  OBJSCHEMA                     OBJNAME              OBJATTRIB TYPE    STATUS   DETAILS

  ---------------------------------------------------------------------------------------------------

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_1     T                 OK      Synced from external metastore, No action needed

  ICEBERG_IMPORT_MODIFY    ICEBERG_MODIFY_TABLE_2     T                 OK      Synced from external metastore, No action needed

  2 record(s) selected.

  Return Status = 0