APAR status
Closed as fixed if next.
Error description
When an ALTER TABLE ADD IF NOT EXISTS PARTITION statement is executed, specifying a partition that does exist, the scheduler cache is incorrectly incremented as if the partition were added, even though no additional rows are added to the table. This results in SELECT statements against the table returning incorrect row counts until the cache is refreshed (by default, after 20 minutes). Running SYSHADOOP.HCAT_CACHE_SYNC will clear the cache. Similarly, ALTER TABLE DROP IF EXISTS PARTITION can result in the cached row count being incorrectly decremented. For example: 1. Create directories on hdfs and copy data files. 2. Run CREATE TABLE, then ALTER ADD to add 5 partitions (each partition has 10 rows): CREATE EXTERNAL HADOOP TABLE cachetest(id int) PARTITIONED BY (snapshot_dt DATE, partcol VARCHAR (10)) LOCATION '/tmp/data/cachetest'; ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-07-18',partcol='AAA'); 0 rows affected (total: 0.351s) select count(*) from cachetest; +----+ | 1 | +----+ | 10 | +----+ 1 row in results(first row: 0.307s; total: 0.309s) ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-09-30',partcol_code='BBB'); 0 rows affected (total: 0.586s) ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-11-30',partcol_code='CCC'); 0 rows affected (total: 0.491s) ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-11-30',partcol_code='DDD'); 0 rows affected (total: 0.326s) ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-12-31',partcol_code='EEE'); 0 rows affected (total: 0.413s) select count(*) from cachetest; +----+ | 1 | +----+ | 50 | +----+ 1 row in results(first row: 0.427s; total: 0.433s) 3. Alter the table to add an existing partition conditionally. ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-07-18',partcol_code='AAA'); 0 rows affected (total: 0.761s) select count(*) from cachetest; +----+ | 1 | +----+ | 60 | +----+ 1 row in results(first row: 0.464s; total: 0.468s) The cached row count should have stayed at 50, not increased to 60. There are now 2 duplicate members in the cache. ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION (snapshot_dt='2016-07-18',partcol_code='AAA'); 0 rows affected (total: 0.316s) select count(*) from cachetest; +----+ | 1 | +----+ | 70 | +----+ 1 row in results(first row: 0.428s; total: 0.431s) There are now 3 duplicate members in the cache.
Local fix
There are two ways to work around this problem. (1) Run SYSHADOOP.HCAT_CACHE_SYNC after running the ALTER TABLE command, regardless of whether a partition was added or dropped. Or wait for the cache to clear automatically. (2) Add the following property to the bigsql-conf.xml file on the head node: <property> <name>scheduler.cache.delta.updates.for.partitions</name> <value>false</value> <description>scheduler delta update partitions option</description> </property>
Problem summary
Problem is fixed in Big SQL version 4.2.5 and later versions.
Problem conclusion
Temporary fix
See local fix
Comments
APAR Information
APAR number
PI85451
Reported component name
INFO BIGINSIGHT
Reported component ID
5725C0900
Reported release
420
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-08-03
Closed date
2017-08-03
Last modified date
2017-08-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
R420 PSN
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"420","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
25 August 2020