Monitoring parallel operations

You can monitor the use of parallelism by Db2.

About this task

The number of parallel tasks that Db2 uses to access data is determined at bind time, and is adjusted again when the query is executed.

Bind time

At bind time, Db2 collects partition statistics from the catalog, estimates the processor cycles for the costs of processing the partitions, and determines the optimal number of parallel tasks to achieve minimum elapsed time.

When a planned degree exceeds the number of online CPs, the query might not be completely processor-bound. Instead it might be approaching the number of partitions because it is I/O-bound. In general, the more I/O-bound a query is, the closer the degree of parallelism is to the number of partitions.

In general, the more processor-bound a query is, the more degree of parallelism is related to the to the number of online CPs. However, the degree of parallelism can exceed the number of CPs. The default degree of parallelism is twice the number of CPs. A degree of parallelism that is greater than the number of CPs is beneficial in cases where works is distributed unevenly among parallel child tasks.

To help Db2 determine the optimal degree of parallelism, use the RUNSTATS utility to keep your statistics current.

You can find the expected degree of parallelism in the ACCESS_DEGREE and JOIN_DEGREE columns of the PLAN_TABLE.

Execution time

For each parallel group, parallelism can execute at a reduced degree or degrade to sequential operations for the following reasons:

Amount of virtual buffer pool space available
Host variable values
Availability of the hardware sort assist facility
Ambiguous cursors
A change in the number or configuration of online processors

Procedure

To monitor parallel operations, use one of the following approaches:

Use the DISPLAY BUFFERPOOL report to see how well the buffer pool is able to satisfy parallel operations.
```
DSNB440I = PARALLEL ACTIVITY -
            PARALLEL REQUEST =      282  DEGRADED PARALLEL=        5
```
The PARALLEL REQUEST field in this example shows that Db2 was negotiating buffer pool resource for 282 parallel groups. Of those 282 groups, only 5 were degraded because of a lack of buffer pool resource. A large number in the DEGRADED PARALLEL field could indicate that your subsystem does not have enough buffers that can be used for parallel processing.
Use the DISPLAY THREAD command.
The status field contains PT for parallel tasks. All parallel tasks are displayed immediately after their corresponding originating thread.
Use Db2 trace:
- The statistics trace indicates The statistics trace indicates when parallel groups do not run with the planned degree of parallelism or run sequentially. Either situation might indicate that some queries do not achieve the best possible response times.
- You can use the accounting trace to ensure that your parallel queries are meeting their response time goals. Db2 rolls task accounting into an accounting record for the originating task. IBM® OMEGAMON® for Db2 Performance Expert on z/OS® also summarizes all accounting records generated for a parallel query and presents them as one logical accounting record.
  IBM OMEGAMON for Db2 Performance Expert on z/OS presents the times for the originating tasks separately from the accumulated times for all the parallel tasks.
  
  As shown in the following accounting trace record excerpt, CP CPU TIME-AGENT is the time for the originating tasks, while CP CPU TIME-PAR.TASKS is the accumulated processing time for the parallel tasks.
  Figure 1. A partial sample that shows parallelism fields in the IBM OMEGAMON for Db2 Performance Expert on z/OS accounting trace report
```
TIMES/EVENTS  APPL (CLASS 1)   DB2 (CLASS 2)    CLASS 3 SUSP.   ELAPSED TIME
------------  --------------  --------------    --------------  ------------
ELAPSED TIME       32.578741       32.312218    LOCK/LATCH         25.461371
 NONNESTED         28.820003       30.225885    SYNCHRON. I/O       0.142382
 STORED PROC        3.758738        2.086333     DATABASE I/O       0.116320
 UDF                0.000000        0.000000     LOG WRTE I/O       0.026062
 TRIGGER            0.000000        0.000000    OTHER READ I/O   3:00.404769
                                                OTHER WRTE I/O      0.000000
CPU CP TIME      1:29.695300     1:29.644026    SER.TASK SWTCH      0.000000
 AGENT              0.225153        0.178128     UPDATE COMMIT      0.000000
  NONNESTED         0.132351        0.088834     OPEN/CLOSE         0.000000
  STORED PRC        0.092802        0.089294     SYSLGRNG REC       0.000000
  UDF               0.000000        0.000000     EXT/DEL/DEF        0.000000
  TRIGGER           0.000000        0.000000     OTHER SERVICE      0.000000
 PAR.TASKS       1:29.470147     1:29.465898    ARC.LOG(QUIES)      0.000000
  ⋮
…            QUERY PARALLEL.     TOTAL
               ---------------  --------
               MAXIMUM MEMBERS         1
               MAXIMUM DEGREE         10
               GROUPS EXECUTED         1
                RAN AS PLANNED         1
                RAN REDUCED            0
                ONE  COOR=N            0
                ONE  ISOLAT            0
                ONE  DCL TTABLE        0
                SEQ - CURSOR           0
                SEQ - NO ESA           0
                SEQ - NO BUF           0
                SEQ - ENCL.SER.        0
               MEMB SKIPPED(%)         0
               DISABLED BY RLF        NO
               REFORM PARAL-CONFIG     0
               REFORM PARAL-NO BUF     0
```
  As the report shows, the values for CPU TIME and I/O WAIT TIME are larger than the elapsed time. The processor and suspension time can be greater than the elapsed time because these two times are accumulated from multiple parallel tasks. The elapsed time would be less than the processor and suspension time if these two times are accumulated sequentially.
  
  If you have baseline accounting data for the same thread run without parallelism, the elapsed times and processor times must not be larger when that query is run in parallel. If they are larger, or if response time is poor, you need to examine the accounting data for the individual tasks. Use the IBM OMEGAMON for Db2 Performance Expert on z/OS Record Trace for the IFCID 0003 records of the thread you want to examine. Use the performance trace if you need more information to determine the cause of the response time problem.
- If you discover a potential problem with a parallel query, you can use the performance trace to do further analysis. You can refer to field QW0221AD in IFCID 0221, as mapped by macro DSNDQW03. The 0221 record also gives you information about the key ranges used to partition the data.
  IFCID 0222 contains the elapsed time information for each parallel task and each parallel group in each SQL query. IBM OMEGAMON for Db2 Performance Expert on z/OS presents this information in its SQL Activity trace.
  
  If your queries are running sequentially or at a reduced degree because of a lack of buffer pool resources, the QW0221C field of IFCID 0221 indicates which buffer pool is constrained.