SQL0601N error when creating a Datalake table after a drop schema from a non-catalog node
A CREATE DATALAKE TABLE statement may fail with an SQL0601N error if a DROP SCHEMA statement is run from a non-catalog node and that schema has been used as part of a previous CREATE DATALAKE TABLE statement.
Symptoms
Executing a DROP SCHEMA statement from a non-catalog node prior to creating a Datalake table may result in an SQL0601N error. This occurs as dropping a schema from a non-catalog node does not properly clean up all schema-related information in the local Hive metastore (HMS) used for Datalake table management.
Causes
There are two scenarios where an SQL601N error may be returned when creating a Datalake table statement:
- When multiple tables are created with using identifiers that differ only in case. This is because Hive identifiers are case-insensitive even if they are delimited identifiers. In Hive, identifiers are always treated as if written in lower-case. See Identifier handling for Datalake tables for more information on the handling of Datalake table identifiers.
- When a DROP SCHEMA statement runs on a non-catalog node and the associated schema information in the local Hive Metastore (HMS) is not be cleaned up. This can cause issues when subsequent CREATE DATALAKE TABLE statements are issued depending on the identifiers used for the schemas.
Consider the following scenario:
CREATE DATALAKE TABLE eschemafoo.schemaTestTable(I INT) PARTITIONED BY (SS VARCHAR(10)) LOCATION 'DB2REMOTE://mybucket//eschemafooschemaTestTable/' TBLPROPERTIES('external.table.purge'='true')select varchar(tabschema,25) as db2_schema, varchar(hiveschema,25) as hive_schema, varchar(tabname,25) as db2_tabname, varchar(hivetab,25) as hive_tabname from syshadoop.hcat_tables where tabschema='ESCHEMAFOO' or tabschema='eschemafoo'As shown blow, the identifiers in Hive are lower case and the identifiers in Db2 are upper case.
DB2_SCHEMA HIVE_SCHEMA DB2_TABNAME HIVE_TABNAME ------------------------- ------------------------- ------------------------- ------------------------- ESCHEMAFOO eschemafoo SCHEMATESTTABLE schematesttable
DROP datalake TABLE eschemafoo.schemaTestTable DELETE DATA PURGE
DROP SCHEMA eschemafoo RESTRICTdb2 "select DB_ID, varchar(NAME,32) as NAME, varchar(db_location_uri,100) as db_loc_uri from SYSHIVE.DBS"
DB_ID NAME DB_LOC_URI
-------------------- -------------------------------- ----------------------------------------------------------------------------------------------------
1 default file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive
2 db2inst1 file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive/db2inst1.db
3 eschemafoo file:/mnt/blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive/eschemafoo.db
ls -l ./blumeta0/home/db2inst1/sqllib/bigsql/warehouse/tablespace/external/hive
drwxr-sr-x. 2 db2inst1 db2iadm1 6 Feb 28 18:12 eschemafoo.db. <- leftover schema
CREATE DATALAKE TABLE "eschemafoo".schemaTestTable(I INT) PARTITIONED BY (SS VARCHAR(10)) LOCATION 'DB2REMOTE://mybucket//eschemafooschemaTestTable/' TBLPROPERTIES('external.table.purge'='true')DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0601N The name of the object to be created is identical to the existing
name "eschemafoo" of type "SCHEMA". SQLSTATE=42710
Resolving the problem
To mitigate the potential mishandling of Hive schema information, it is highly recommended to execute Datalake table DDL statements, specifically DROP SCHEMA, on the catalog node.
If encountering issues resulting from dropping a schema on a non-catalog node, it is recommended to first create the schema on the catalog node and then execute the DROP SCHEMA statement from there, ensuring successful removal of the HMS schema data.