SQL0601N error when creating a Datalake table after a drop schema from a non-catalog node

A CREATE DATALAKE TABLE statement may fail with an SQL0601N error if a DROP SCHEMA statement is run from a non-catalog node and that schema has been used as part of a previous CREATE DATALAKE TABLE statement.

Symptoms

Executing a DROP SCHEMA statement from a non-catalog node prior to creating a Datalake table may result in an SQL0601N error. This occurs as dropping a schema from a non-catalog node does not properly clean up all schema-related information in the local Hive metastore (HMS) used for Datalake table management.

Causes

There are two scenarios where an SQL601N error may be returned when creating a Datalake table statement:

  1. When multiple tables are created with using identifiers that differ only in case. This is because Hive identifiers are case-insensitive even if they are delimited identifiers. In Hive, identifiers are always treated as if written in lower-case. See Identifier handling for Datalake tables for more information on the handling of Datalake table identifiers.
  2. When a DROP SCHEMA statement runs on a non-catalog node and the associated schema information in the local Hive Metastore (HMS) is not be cleaned up. This can cause issues when subsequent CREATE DATALAKE TABLE statements are issued depending on the identifiers used for the schemas.

Consider the following scenario:

When the following statement is run, the table and the schema will be created both in the Db2 catalog table and the local HMS.
CREATE DATALAKE TABLE eschemafoo.schemaTestTable(I INT) PARTITIONED BY (SS VARCHAR(10)) LOCATION 'DB2REMOTE://mybucket//eschemafooschemaTestTable/' TBLPROPERTIES('external.table.purge'='true')
The following query shows schema and table names as stored in Db2 catalog and the local HMS.
select varchar(tabschema,25) as db2_schema, varchar(hiveschema,25) as hive_schema, varchar(tabname,25) as db2_tabname, varchar(hivetab,25) as hive_tabname from syshadoop.hcat_tables  where tabschema='ESCHEMAFOO' or tabschema='eschemafoo'

As shown blow, the identifiers in Hive are lower case and the identifiers in Db2 are upper case.

DB2_SCHEMA                HIVE_SCHEMA               DB2_TABNAME               HIVE_TABNAME
------------------------- ------------------------- ------------------------- -------------------------
ESCHEMAFOO                eschemafoo                SCHEMATESTTABLE           schematesttable

Continue and drop the table and the schema.
DROP datalake TABLE eschemafoo.schemaTestTable DELETE DATA PURGE
DROP SCHEMA eschemafoo RESTRICT

Although the schema information has been removed from the Db2 catalog, the schema information in the local HMS could not be cleaned up.
db2 "select DB_ID, varchar(NAME,32) as NAME,  varchar(db_location_uri,100)  as db_loc_uri from SYSHIVE.DBS"
DB_ID                NAME                             DB_LOC_URI
-------------------- -------------------------------- ----------------------------------------------------------------------------------------------------
                   1 default                          file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive
                   2 db2inst1                         file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive/db2inst1.db
                   3 eschemafoo                       file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive/eschemafoo.db


ls -l ./blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive
drwxr-sr-x. 2 db2inst1 db2iadm1 6 Feb 28 18:12 eschemafoo.db. <- leftover schema

If a subsequent CREATE DATALAKE TABLE statement is run using a delimited identifier, an error will occur.
CREATE DATALAKE TABLE "eschemafoo".schemaTestTable(I INT) PARTITIONED BY (SS VARCHAR(10)) LOCATION 'DB2REMOTE://mybucket//eschemafooschemaTestTable/' TBLPROPERTIES('external.table.purge'='true')
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0601N  The name of the object to be created is identical to the existing
name "eschemafoo" of type "SCHEMA".  SQLSTATE=42710

Resolving the problem

To mitigate the potential mishandling of Hive schema information, it is highly recommended to execute Datalake table DDL statements, specifically DROP SCHEMA, on the catalog node.

If encountering issues resulting from dropping a schema on a non-catalog node, it is recommended to first create the schema on the catalog node and then execute the DROP SCHEMA statement from there, ensuring successful removal of the HMS schema data.