IBM Support

Performance considerations for using Declared Global Temporary Tables.

Technical Blog Post


Abstract

Performance considerations for using Declared Global Temporary Tables.

Body

    DGTT's are frequently used in stored procedure code to store and manipulate an intermediate result set.
    There are however a few performance considerations which I wanted to illustrate and share.
    In particular issues that are related to highly concurrent execution of DGTT's.

    First of all I will start off with the baseline example, a simple stored procedure creating 5 dgtt's :

 

    create or replace procedure dgtt1 ( out level int )
    language sql
    begin atomic
      declare  a integer;
      declare b integer;
      declare c integer;
      declare d integer;
      declare e integer;

      set level = 1;
      declare global temporary table t1 like sysibm.syscoldist on commit preserve rows not logged with replace;
      create index session.idx1 on session.t1( name, tbname )   collect detailed statistics ;

      declare global temporary table t2 like sysibm.syscoldist on commit preserve rows not logged with replace;
      create index session.idx2 on session.t2( name, tbname )   collect detailed statistics ;

      declare global temporary table t3 like sysibm.syscoldist on commit preserve rows not logged with replace;
      create index session.idx3 on session.t3( name, tbname )   collect detailed statistics ;

      declare global temporary table t4 like sysibm.syscoldist on commit preserve rows not logged with replace;
      create index session.idx4 on session.t4( name, tbname )   collect detailed statistics ;

      declare global temporary table t5 like sysibm.syscoldist on commit preserve rows not logged with replace;
      create index session.idx5 on session.t5( name, tbname )   collect detailed statistics ;
    
      insert into session.t5  select * from sysibm.syscoldist fetch first 5 rows only;
      insert into session.t4  select * from sysibm.syscoldist fetch first 5 rows only;
      insert into session.t3  select * from sysibm.syscoldist fetch first 5 rows only;
      insert into session.t2  select * from sysibm.syscoldist fetch first 5 rows only;
      insert into session.t1  select * from sysibm.syscoldist fetch first 5 rows only;

      select max( seqno ) into a from session.t5;
      select max( seqno ) into b from session.t4;
      select max( seqno ) into c from session.t3;
      select max( seqno ) into d from session.t2;
      select max( seqno ) into e from session.t1;

    end

 

  The test I executed runs 50 simultaneous threads and each of these threads calls the procedure above 100 times.
  I used these columns  : TOTAL_CPU_TIME, TOTAL_WAIT_TIME, TOTAL_SECTION_TIME, TOTAL_IMPLICIT_COMPILE_PROC_TIME, TOTAL_APP_SECTION_EXECUTIONS 

  from the SYSPROC.MON_GET_DATABASE monitoring table function to give me an indication of performance.

  This gives me cpu usage per second , the time spent  executing the section , the time spent waiting, the time in compiling statements and the number of section executions

   ( a stored procedure will have multiple statements, hence multiple sections )

 

The first run gave me these numbers :     

     Monitoring starting at 12:08:56 PM
     ( all times reported in milliseconds )

  
           cpu   section           wait  comp time          execs

           661      3946          20391        863           1211
           715      1076          30518        922           1294
           828      1087          44350        994           1398

           ...
           834      1125           1030       1056           1486
           775      1041            980        996           1392
           839      1175           1093       1060           1488

     Monitoring ends at 12:10:10 PM
     Summary:

         58311    104786         201410       74729         104589

 

  (    The output has been abbreviated on purpose. )


    We can see that per second about 1.3k individual section invocations are executed.
    The number of section executions is typically larger than the number of statements, as it involves a few section invocations per statement.
    e.g. when there is open / fetch /close processing. But it is a good metric for throughput of the database server.

    This is our baseline performance, executing in 74 seconds.

    You will note that I created indexes on the DGTT tables in this example.
    By default indexes on DGTT tables are compressed. This makes sense, it might save space and I/O for larger DGTT's.
    However, we created indexes on very small tables and are running this in a highly concurrent way.
    This adds a lot of overhead as it performance a license check to see if compression is allowed and this is serialized.
    To illustrate the difference, I altered the index definition to avoid compression by using the "compress no" option :
        
    create index session.idx1 on session.t1(  seqno )  compress no collect detailed statistics ;

 

     Monitoring starting at 12:11:34 PM
    
           cpu   section           wait    comp time          execs
                                           
          1513     10832          54656         4107           5760
          1494     12562          68547         3813           5363
          1729     13772          33338         3919           5482
    ...                                    
          1766     12395          33236         3814           5343
          1732     12913          32958         3880           5445
          1683     12720          30881         4084           5702
          1679     10428          22089         4056           5659
           879      4386           5333         2633           3651

     Monitoring ends at 12:11:54 PM
    Summary:

         29886    239664         652748        71665         100296

 

    The effect is enormous.. execution has gone from 74 seconds to 20 seconds and throughput has changed from 1.3k execs on average to 5.4 k execs.

    The lesson to be learned is to understand the usage of the procedure and the volume of data in the DGTT.
    Avoiding creating compressed indexes on small temporary tables in procedures which are executed with high concurrency.
        
    The next test involves the life of the DGTT. These exist for the duration of a connection or until they are explicitly dropped.
    This is important for a couple of reasons.
    1. If a dgtt is created using the "on commit preserve rows" option, then that DGTT will keep the rows until the connection is dropped.
       With connection pools used in application servers, this could mean that the data hangs around for quite some time. This is likely not desirable.
    2. If a dgtt is not explicitly dropped, it will also be dropped when the next "declare global temporary table ... " statement is seen for the same table.
       This means that before the new DGTT is created, the old one is dropped.
       
       The net result for the stored procedure execution time, when dropping a DGTT explicitly is not that different as seen by the following metrics :

     Monitoring starting at 12:13:28 PM
    
           cpu   section           wait  comp time          execs

          2078     13961          94592       6951           9055
          1587     10482          47026       4791           6257
          1645     10479          36429       4731           6142
          1645     10789          35693       4748           6180
          1562     10729          36903       4780           6227
    ..
          1786     10314          32615       5170           6706
          1507      8849          22985       5051           6532
           655      3019           4060       2550           3275
     Monitoring ends at 12:13:50 PM
     Summary:
         31718    209994         751718      94785         123168


        
    We get about the same runtime ( 2 seconds more , but that's within normal variation )
    The number of execs increased though - from 5.5k to 6.x k   / second.
    The reason is that we added the explicitly drop table session.t<n>   to the procedure.

 

    An important reason to explicitly drop a DGTT is that otherwise the package cache is not cleared of these dgtt statement until either regular package cache maintenance
    removes the statement, or until the DGTT is declared again. Not dropping this explicitly can lead to an inflated package cache.

    This can actually lead to more contention in a highly concurrent environment. In the initial version of the procedure, I declared all DGTT's at the start.
    Upon re-execution of the procedure the database server will need to
    1. drop the old dgtt
    2. remove all statements from the package cache referencing this old dgtt
    3. create the new dgtt

    It is not hard to imagine that with more concurrency this can lead to bursty activity on the package cache and more contention.

    My next test was hence to declare the dgtt when needed and drop the dgtt after it is no longer needed.

     Monitoring starting at 12:15:31
    
           cpu   section           wait  comp time          execs

          1448     12106          36868       5508           7150
          1546     19874          55385       5267           6833
          1483     28233          50287       5316           6908
          1742     28723          25332       5754           7482
    ..
          1711     23567          20822       5920           7700
          1609     18775          16236       6024           7840
           927      6304           4862       3580           4674

    Monitoring ends at 12:15:51 PM
    Summary:

         28056    461845         508047       97852         127214


        
    The net result is somewhat the same, about a 20 second execution time, slightly less cpu usage ( less contention ) and slightly improved throughput.

    Now one thing that I have not commented on is the compilation time.
    You might find it odd that we have compilation time at all, given that these are stored procedures and it should be using static SQL.
    The key thing to understand is that DB2 can only compile the statements using the DGTT when the DGTT has been created.
    So it is forced to compile all SQL in the stored procedure when executing the procedure.
    Moreover, each invocation redeclares the dgtt, so there is no re-use possible of the cached statements.

    In order to break this 20 second barrier, we need to get rid of these compilations to benefit from static SQL.
    One way to do this is to avoid using DGTT's completely.

    In my simple test, this can easily be achieved :

 

    insert into session.t5  select * from sysibm.syscoldist fetch first 5 rows only;
    select max( seqno ) into e from session.t1;

    ==> replaced by

    select max(seqno) into e from sysibm.syscoldist fetch first 5 rows only;

 

    but if this is not easily done, then consider using Created Global Temporary Tables instead of using DGTT's.

    These are created up front e.g. by running :

    The Created global temporary tables are just created once outside of the procedure code :

      create global temporary table t1 like sysibm.syscoldist on commit preserve rows not logged ;
      create global temporary table t2 like sysibm.syscoldist on commit preserve rows not logged ;
      create global temporary table t3 like sysibm.syscoldist on commit preserve rows not logged ;
      create global temporary table t4 like sysibm.syscoldist on commit preserve rows not logged ;
      create global temporary table t5 like sysibm.syscoldist on commit preserve rows not logged ;

 

      The procedure body now becomes :

 

      set level = 1;
      insert into t1  select * from sysibm.syscoldist fetch first 5 rows only;
      select max( seqno ) into e from t1;
      insert into t2  select * from sysibm.syscoldist fetch first 5 rows only;
      select max( seqno ) into e from t2;
      insert into t3  select * from sysibm.syscoldist fetch first 5 rows only;
      select max( seqno ) into e from t3;
      insert into t4  select * from sysibm.syscoldist fetch first 5 rows only;
      select max( seqno ) into e from t4;
      insert into t5  select * from sysibm.syscoldist fetch first 5 rows only;
      select max( seqno ) into e from t5;

 

    note that I do not need to reference the CGTT tables with the "SESSION" schema.

    If I run this now, the result is phenomenal, it completes in less than 2 seconds.
    ( this includes the time creating 50 connections )

     Monitoring starting at 12:17:14

 
       cpu   section           wait   comp time          execs

       570       444          43875         141          15708

 Monitoring ends at 12:17:16 PM
Summary:


       570       444          43875         141          15708

       
CPU usage is very low, compilation time is very low as well...

 

To summarize, when using DGTTs in a highly concurrent environment, be aware of :
* compressed indexes
* Package cache contention
* Retention of data in the temp tables.
* high number or compilation.

My advice :
* use no or uncompressed indexes for low data volumes
* Use CGTT instead of DGTT
* Drop DGTT explicitly in procedure code.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm13285885