Incorrect results when querying recently refreshed or loaded partitioned tables

SELECT might return incorrect results when run concurrently with ADD or DROP PARTITION.

Symptoms

The correct or expected results are seen if the SELECT is run again, generally within a half hour.

Causes

When multiple operations that evict or load table metadata into the scheduler cache are run concurrently against the same table, it is possible to hit a race condition. This race condition results in the refreshed cache not reflecting the latest changes to the table. As a result, a SELECT from that table may not return the expected data.

ADD and DROP PARTITION statements are passed through to Hive, which uses its own internal locking, these statements do not block ongoing DML, like SELECT, which might depend on the affected data.

By default, ALTER Datalake TABLE ADD (or DROP) PARTITION commands will not evict the entire table from the cache, but will load (or evict) the specified partition metadata only.

The entire table is evicted when:

  • Dropping a partial partition.
  • Altering a table when the parameter scheduler.cache.delta.updates.for.partitions is disabled (set to FALSE in bigsql_conf.xml).

Resolving the problem

There are a few ways to address this rare condition:

  • Do not run concurrent DDL and DML on the same table.
  • Enable scheduler.cache.delta.updates.for.partitions (if it has been disabled).
  • Stop the concurrent queries (and any other connections concurrently querying the table), and rerun the DROP PARTITION and ADD PARTITION statements. This will forcibly evict the stale cache reference from the scheduler.
  • To force a refresh on the table metadata, run the following command:.
    db2 MSCK REPAIR TABLE <TABLENAME>
    For more information, see MSCK REPAIR TABLE (Datalake) statement.
  • Run the HCAT_CACHE_SYNC stored procedure on the affected table. For more information, see HCAT_CACHE_SYNC stored procedure.