IBM Support

Identifying Subsection Bottlenecks in DB2 DPF Queries

Technical Blog Post


Abstract

Identifying Subsection Bottlenecks in DB2 DPF Queries

Body

  Queries in partitioned (DPF) environments generally have multiple subsections executing concurrently,  with each subsection executing on one or more database partitions.  As a result when a DPF query is performing poorly it can be important to identify which subsection or subsections may be holding up the progress of the query.   For long running DPF queries,  explain (db2exfmt) information and global application snapshots provide very useful information which can be used for this purpose.   These specific techniques do not necessarily apply to SMP parallelized or BLU queries.

To use this method,  it is important to take global application snapshots,  as this is necessary in order to get complete information on the status of the subsections that are executing on each partition.   To use the snapshot and db2exfmt information,  it is necessary to identify which subsections in the snapshot output are incurring significant cost,  and then associate those subsections with parts of the explain output.

 

1) Identifying subsections:  

 

In the plan graph shown in db2exfmt output,  TQ operators (BTQ, DTQ,  MDTQ, etc) are the operators which separate different subsections.    For example in the following plan graph,  

NLJOIN 17 is part of one subsection

HSJOIN 19,  HSJOIN 20, HSJOIN 21, TBSCAN 22, TBSCAN 23,  TBSCAN 24,  HSJOIN 25, TBSCAN 26,  SORT 27,  HSJOIN 28,  FETCH 33,  RIDSCN 34,  SORT 35,  IXSCAN 36 are all part of the same subsection

TBSCAN 30 is part of its own subsection
TBSCAN 32 is part of its own subsection.  


                                                                                   3815.07
                                                                                   NLJOIN
                                                                                   (  17)
                                                                                   46682.4
                                                                                   14289.9
                                                /------------------------------------+--------
                                            53.5666            
                                            BTQ               
                                            (  18)                                                               
                                            2062.51                                                              
                                            1198.06                                                              
                                              |                                                                  
                                            1.11597   
                                            ^HSJOIN
                                            (  19)   
                                            2062.28  
                                            1198.06  
                            /-----------------+-----------------\                                                
                      1.12514e+06                               21.9646
                        HSJOIN                                  NLJOIN
                        (  20)                                  (  25)  
                        416.302                                 1562.11
                          295                                   903.061
                 /--------+--------\                 /------------+-------------\      
              3136                  80680       0.104167                        210.86
             HSJOIN                TBSCAN        TBSCAN                         FETCH  
             (  21)                (  24)        (  26)                         (  33)  
             60.9797               335.592       45.6803                        1516.43
                8                    287            6                           897.061  
         /-----+------\              |             |                          /---+----\
       56               56          80680       0.104167                  131769       423251
     TBSCAN           TBSCAN   TABLE: S1         SORT                     RIDSCN   TABLE: S1   
     (  22)           (  23)     Q35_TBL         (  27)                   (  34)       Q41_TBL  
     30.4679          30.4746        Q35         45.6798                  583.104        Q41
        4                4                          6                     186.552                                
       |                |                          |                        |                                    
       264              264                     0.104167                  131769                                 
 TABLE: S1        TABLE: S1                      ^HSJOIN                  SORT        
     Q42_TBL          Q36_TBL                    (  28)                   (  35)
       Q42              Q36                      45.6791                  577.502
                                                    6                     186.552                                
                                            /------+------\                 |                                    
                                         3.25            0.104167         131769       
                                        DTQ               DTQ             IXSCAN       
                                        (  29)            (  31)          (  36)                                 
                                        45.5818          0.0966973        476.408                                
                                           6                 0            186.552                                
                                          |                 |               |                                    
                                          156            0.104167         423251                                 
                                        TBSCAN            TBSCAN      INDEX: S1       
                                        (  30)            (  32)        IX_Q41_03  
                                        45.4095         2.63922e-05         Q41   
                                           6                 0   
                                          |                 |   
                                          156                5                                                   
                                    TABLE: S1        TABFNC: SYSIBM                                              
                                      Q40_TBL                GENROW                                                 
                                          Q40               Q29

 

2) Global application snapshot information:

 

Global application snapshots provide useful information about each subsection,  and the activity corresponding to that subsection on each partition:  


  Subsection number                                  = 6
  Subsection database member number                  = 3
  Subsection status                                  = Executing
  Execution elapsed time (seconds)                   = 3
  Total user CPU time (sec.microsec)                 = 0.597753
  Total system CPU time (sec.microsec)               = 0.003474
  Current number of tablequeue buffers overflowed    = 3583
  Total number of tablequeue buffers overflowed      = 4235
  Maximum number of tablequeue buffers overflowed    = 3583
  Rows received on tablequeues                       = 3
  Rows sent on tablequeues                           = 95363
  Rows read                                          = 101836
  Rows written                                       = 4235      

                                           

The subsection breakdown contains various metrics on the execution of the corresponding subsection on a specific node (in this case subsection 6 on partition 3).    In order to relate the subsection numbers to operators in the db2exfmt graph,  this information is contained in the operator details for the TQ:

 

        18) TQ    : (Table Queue)

                TQSECNFM: (Runtime Table Queue Receives From Section #)
                        7
                TQSECNTO: (Runtime Table Queue Sends to Section #)
                        6

The TQSECNFM indicates that the operator that is below the TQ is part of subsection 7
The TQSECNTO indicates that the operator that is above the TQ is part of subsection 6

 

                                                                                   3815.07
                                                                                   NLJOIN
                                                                                   (  17)   subsection 6
                                                                                   46682.4
                                                                                   14289.9
                                                /------------------------------------+--------
                                            53.5666            
                                            BTQ               
                                            (  18)                                                               
                                            2062.51                                                              
                                            1198.06                                                              
                                              |                                                                  
                                            1.11597   
                                            ^HSJOIN
                                            (  19)   subsection 7
                                            2062.28  
                                            1198.06  

 

By looking at the application snapshot information and observing which subsections are busy (i.e. accumulating significant CPU time, rows read/written, tablequeue buffers overflowed, rows read/sent on tablequeue, etc)  and using the db2exfmt information to pinpoint those subsections in the db2exfmt graph,  the costly parts of the query can be identified.  Some general hints/tips are:  

 

1)  If a subsection is not accumulating many rows received on tablequeues for long periods of time, it is often the case that is not in that subsection,  but in subsection(s) below that point in the db2exfmt graph.  

2) If a subsection is consistently busy on all its partitions, then that subsection is likely to be significant to the overall performance.  If such a subsection is only busy on one or a small number of partitions on which it executes, then it may be indicative of data skew which is distributing work unevenly to the partitions.  

3) If a subsection is showing up frequently in successive snapshots in "Waiting to send on tablequeue" state,  this can indicate that the subsection above it in the db2exfmt graph is doing expensive processing and not processing rows sent to it in a timely manner.

4)  If a subsection status shows as Completed on all nodes for which there is an entry,  then that subsection has already done all the required processing, and any ongoing processing must be occurring in subsections above it in the db2exfmt graph. 

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140748