IBM Support

PI85451: ALTER TABLE ADD IF NOT EXISTS PARTITION RESULTS IN WRONG RESULTS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • When an ALTER TABLE ADD IF NOT EXISTS PARTITION statement is
    executed, specifying a partition that does exist, the scheduler
    cache is incorrectly incremented as if the partition were
    added, even though no additional rows are added to the table.
    This results in SELECT statements against the table returning
    incorrect row counts until the cache is refreshed (by default,
    after 20 minutes).  Running SYSHADOOP.HCAT_CACHE_SYNC will
    clear the cache.
    
    Similarly, ALTER TABLE DROP IF EXISTS PARTITION can result in
    the cached row count being incorrectly decremented.
    
    For example:
    
    1. Create directories on hdfs and copy data files.
    
    2. Run CREATE TABLE, then ALTER ADD to add 5 partitions (each
    partition has 10 rows):
    
    CREATE EXTERNAL HADOOP TABLE cachetest(id int) PARTITIONED BY
    (snapshot_dt DATE, partcol VARCHAR (10)) LOCATION
    '/tmp/data/cachetest';
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-07-18',partcol='AAA');
    0 rows affected (total: 0.351s)
    select count(*) from cachetest;
    +----+
    |  1 |
    +----+
    | 10 |
    +----+
    1 row in results(first row: 0.307s; total: 0.309s)
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-09-30',partcol_code='BBB');
    0 rows affected (total: 0.586s)
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-11-30',partcol_code='CCC');
    0 rows affected (total: 0.491s)
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-11-30',partcol_code='DDD');
    0 rows affected (total: 0.326s)
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-12-31',partcol_code='EEE');
    0 rows affected (total: 0.413s)
    
    select count(*) from cachetest;
    +----+
    |  1 |
    +----+
    | 50 |
    +----+
    1 row in results(first row: 0.427s; total: 0.433s)
    
    3. Alter the table to add an existing partition conditionally.
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-07-18',partcol_code='AAA');
    0 rows affected (total: 0.761s)
    
    select count(*) from cachetest;
    +----+
    |  1 |
    +----+
    | 60 |
    +----+
    1 row in results(first row: 0.464s; total: 0.468s)
    
    The cached row count should have stayed at 50, not increased to
    60. There are now 2 duplicate members in the cache.
    
    ALTER TABLE cachetest ADD IF NOT EXISTS PARTITION
    (snapshot_dt='2016-07-18',partcol_code='AAA');
    0 rows affected (total: 0.316s)
    
    select count(*) from cachetest;
    +----+
    |  1 |
    +----+
    | 70 |
    +----+
    1 row in results(first row: 0.428s; total: 0.431s)
    
    There are now 3 duplicate members in the cache.
    

Local fix

  • There are two ways to work around this problem.
    (1) Run SYSHADOOP.HCAT_CACHE_SYNC after running the ALTER TABLE
    command, regardless of whether a partition was added or
    dropped.  Or wait for the cache to clear automatically.
    (2) Add the following property to the bigsql-conf.xml file on
    the head node:
    
      <property>
        <name>scheduler.cache.delta.updates.for.partitions</name>
        <value>false</value>
        <description>scheduler delta update partitions
    option</description>
      </property>
    

Problem summary

  • Problem is fixed in Big SQL version 4.2.5 and
    later versions.
    

Problem conclusion

Temporary fix

  • See local fix
    

Comments

APAR Information

  • APAR number

    PI85451

  • Reported component name

    INFO BIGINSIGHT

  • Reported component ID

    5725C0900

  • Reported release

    420

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-08-03

  • Closed date

    2017-08-03

  • Last modified date

    2017-08-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

  • R420 PSN

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"420","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
25 August 2020