Monitoring parallel operations

You can monitor the use of parallelism by Db2.

About this task

The number of parallel tasks that Db2 uses to access data is determined at bind time, and is adjusted again when the query is executed.

Bind time
At bind time, Db2 collects partition statistics from the catalog, estimates the processor cycles for the costs of processing the partitions, and determines the optimal number of parallel tasks to achieve minimum elapsed time.

When a planned degree exceeds the number of online CPs, the query might not be completely processor-bound. Instead it might be approaching the number of partitions because it is I/O-bound. In general, the more I/O-bound a query is, the closer the degree of parallelism is to the number of partitions.

In general, the more processor-bound a query is, the more degree of parallelism is related to the to the number of online CPs. However, the degree of parallelism can exceed the number of CPs. The default degree of parallelism is twice the number of CPs. A degree of parallelism that is greater than the number of CPs is beneficial in cases where works is distributed unevenly among parallel child tasks.

To help Db2 determine the optimal degree of parallelism, use the RUNSTATS utility to keep your statistics current.

You can find the expected degree of parallelism in the ACCESS_DEGREE and JOIN_DEGREE columns of the PLAN_TABLE.

Execution time
For each parallel group, parallelism can execute at a reduced degree or degrade to sequential operations for the following reasons:
  • Amount of virtual buffer pool space available
  • Host variable values
  • Availability of the hardware sort assist facility
  • Ambiguous cursors
  • A change in the number or configuration of online processors

Procedure

To monitor parallel operations, use one of the following approaches:

  1. Use the DISPLAY BUFFERPOOL report to see how well the buffer pool is able to satisfy parallel operations.
    DSNB440I = PARALLEL ACTIVITY -
                PARALLEL REQUEST =      282  DEGRADED PARALLEL=        5
    The PARALLEL REQUEST field in this example shows that Db2 was negotiating buffer pool resource for 282 parallel groups. Of those 282 groups, only 5 were degraded because of a lack of buffer pool resource. A large number in the DEGRADED PARALLEL field could indicate that your subsystem does not have enough buffers that can be used for parallel processing.
  2. Use the DISPLAY THREAD command.
    The status field contains PT for parallel tasks. All parallel tasks are displayed immediately after their corresponding originating thread.
  3. Use Db2 trace:
    • The statistics trace indicates The statistics trace indicates when parallel groups do not run with the planned degree of parallelism or run sequentially. Either situation might indicate that some queries do not achieve the best possible response times.
    • Begin program-specific programming interface information.You can use the accounting trace to ensure that your parallel queries are meeting their response time goals. Db2 rolls task accounting into an accounting record for the originating task. IBM® OMEGAMON® for Db2 Performance Expert on z/OS® also summarizes all accounting records generated for a parallel query and presents them as one logical accounting record.

      IBM OMEGAMON for Db2 Performance Expert on z/OS presents the times for the originating tasks separately from the accumulated times for all the parallel tasks.

      As shown in the following accounting trace record excerpt, CP CPU TIME-AGENT is the time for the originating tasks, while CP CPU TIME-PAR.TASKS is the accumulated processing time for the parallel tasks.

      Figure 1. A partial sample that shows parallelism fields in the IBM OMEGAMON for Db2 Performance Expert on z/OS accounting trace report
      TIMES/EVENTS  APPL (CLASS 1)   DB2 (CLASS 2)    CLASS 3 SUSP.   ELAPSED TIME
      ------------  --------------  --------------    --------------  ------------
      ELAPSED TIME       32.578741       32.312218    LOCK/LATCH         25.461371
       NONNESTED         28.820003       30.225885    SYNCHRON. I/O       0.142382
       STORED PROC        3.758738        2.086333     DATABASE I/O       0.116320
       UDF                0.000000        0.000000     LOG WRTE I/O       0.026062
       TRIGGER            0.000000        0.000000    OTHER READ I/O   3:00.404769
                                                      OTHER WRTE I/O      0.000000
      CPU CP TIME      1:29.695300     1:29.644026    SER.TASK SWTCH      0.000000
       AGENT              0.225153        0.178128     UPDATE COMMIT      0.000000
        NONNESTED         0.132351        0.088834     OPEN/CLOSE         0.000000
        STORED PRC        0.092802        0.089294     SYSLGRNG REC       0.000000
        UDF               0.000000        0.000000     EXT/DEL/DEF        0.000000
        TRIGGER           0.000000        0.000000     OTHER SERVICE      0.000000
       PAR.TASKS       1:29.470147     1:29.465898    ARC.LOG(QUIES)      0.000000
        ⋮
      …            QUERY PARALLEL.     TOTAL
                     ---------------  --------
                     MAXIMUM MEMBERS         1
                     MAXIMUM DEGREE         10
                     GROUPS EXECUTED         1
                      RAN AS PLANNED         1
                      RAN REDUCED            0
                      ONE  COOR=N            0
                      ONE  ISOLAT            0
                      ONE  DCL TTABLE        0
                      SEQ - CURSOR           0
                      SEQ - NO ESA           0
                      SEQ - NO BUF           0
                      SEQ - ENCL.SER.        0
                     MEMB SKIPPED(%)         0
                     DISABLED BY RLF        NO
                     REFORM PARAL-CONFIG     0
                     REFORM PARAL-NO BUF     0

      As the report shows, the values for CPU TIME and I/O WAIT TIME are larger than the elapsed time. The processor and suspension time can be greater than the elapsed time because these two times are accumulated from multiple parallel tasks. The elapsed time would be less than the processor and suspension time if these two times are accumulated sequentially.

      If you have baseline accounting data for the same thread run without parallelism, the elapsed times and processor times must not be larger when that query is run in parallel. If they are larger, or if response time is poor, you need to examine the accounting data for the individual tasks. Use the IBM OMEGAMON for Db2 Performance Expert on z/OS Record Trace for the IFCID 0003 records of the thread you want to examine. Use the performance trace if you need more information to determine the cause of the response time problem.

    • If you discover a potential problem with a parallel query, you can use the performance trace to do further analysis. You can refer to field QW0221AD in IFCID 0221, as mapped by macro DSNDQW03. The 0221 record also gives you information about the key ranges used to partition the data.

      IFCID 0222 contains the elapsed time information for each parallel task and each parallel group in each SQL query. IBM OMEGAMON for Db2 Performance Expert on z/OS presents this information in its SQL Activity trace.

      If your queries are running sequentially or at a reduced degree because of a lack of buffer pool resources, the QW0221C field of IFCID 0221 indicates which buffer pool is constrained.

    End program-specific programming interface information.