Investigating class 3 suspension time

When you encounter class 3 suspension time, you can use suspension values in the accounting reports to focus your investigation. Class 3 suspension time is the amount of wait time, which includes synchronous buffer pool I/O wait time, log I/O wait time, lock and latch wait time and other wait times.

About this task

Accounting class 3 data provides detailed information about the distribution of suspension times and related events.

Procedure

To investigate high class 3 suspension times, complete the following investigations:

  • Check the individual types of suspensions in the Class 3 Suspensions section of the IBM® OMEGAMON® for Db2 Performance Expert on z/OS® accounting report.
  • If lock/latch, drain lock, and claim release suspension times are high, focus your investigation on contention problems, and improving concurrency:
    IRLM lock/latch suspensions
    IRLM lock/latch suspension time is time spent waiting for locked resources, and latches that are used for internal serialization within IRLM. Examine the accounting records to determine whether the suspension time is caused by locks or latches. If the suspension is caused by locks, use performance trace classes 6 and 7. If the suspensions is caused by latches, check for the following conditions:
    • The IRLM trace is active.
    • The WLM dispatching priority of the IRLM address space is too low. It is best to use SYSSTC dispatching priority for the IRLM address space.
    • The IRLM is queried frequently by requests such as DISPLAY DATABASE LOCKS and MODIFY irlmproc,STATUS commands.
    • The DEADLOCK TIME value is to small and locking rates are high.
    • A large number of locks are held before an operation commits. If the MAX HELD LOCKS value in the accounting report is high, commit more frequently.
    Db2 latch suspension times
    Db2 latch suspension time indicates wait time for latches that are acquired internally within Db2 for short term serialization of resources such as storage and control block changes.
  • For greater than expected wait times for synchronous I/O suspensions, complete the following investigations.
    Synchronous I/O suspension time is the total application wait time for synchronous I/Os. It is the total of database I/O and log write I/O. In the IBM OMEGAMON for Db2 Performance Expert on z/OS accounting report, check the values for SYNCHRON. I/O, DATABASE I/O, and LOG WRITE I/O. Database I/O and log I/O are not reported separately at the package level.
    • Check whether the I/O suspension time is high because of a large number of I/O suspensions, or because of long suspension times for each I/O. Long I/O suspension times probably indicate problems that require investigation outside of Db2, such as problems with the IOS component of z/OS, the channel, or the I/O subsystem.
    • Check whether the log I/O suspension times are high. If you see high values for log I/O suspension you can try to improve the log read performance and improve the log write performance.
    • Check the getpage count to look for access path changes. If it has significantly increased, then an access path change might have occurred. However, if the getpage count remained about the same, but the number of I/Os increased significantly, the problem is not an access path change.

      If you have data from accounting trace class 8, the number of synchronous and asynchronous read I/Os is available for individual packages. Determine which package or packages have unacceptable counts for synchronous and asynchronous read I/Os. Activate performance trace classes 1, 2, and 3 so that IBM OMEGAMON for Db2 Performance Expert on z/OS SQL activity reports can identify the SQL statement or cursor that is causing the problem.

    • Check for a lower than expected buffer pool hit ratio.
      1. Look at the number of synchronous reads in the buffer pool that are associated with the plan.
      2. Look at the related buffer pool hit ratio. The buffer pool hit ration is meaningful only for objects that are accessed randomly.
      3. If the buffer pool size and the buffer pool hit ratio for random reads is small, consider the following actions:
        • Increase the buffer pool size. By increasing the buffer pool size, you might reduce the amount of synchronous database I/O and reduce the synchronous I/O suspension time.
        • You might also reduce the value of the sequential buffer pool threshold (VPSEQT). However this change might impact sequential processing
        By increasing the buffer pool size, you might reduce the amount of synchronous database I/O and reduce the synchronous I/O suspension time.
    • Check for system-wide database buffer pool problems. You can also use buffer pool analyzer feature of IBM OMEGAMON for Db2 Performance Expert on z/OS to manage and optimize the buffer pools.
    • Check the SQL ACTIVITY section of the accounting report, and compare that with previous data. Also, check the names of the packages being executed to determine if the pattern of programs being executed has changed.
    • Use the DSNACCOX stored procedure to check for data organization problems. Disorganized data might prevent the use of sequential detection. You can run can invoke the REORG utility to resolve data organization problems.
    • Check for RID pool failures. You can use the values of the FAIL-NO STORAGE (QXNSMIAP) and FAIL-LIMIT EXCEEDED (QXMRMIAP) fields under RID LIST TOTAL in the IBM OMEGAMON for Db2 Performance Expert on z/OS accounting report.
    • Check for system-wide problems in the EDM pool.
  • If other read I/O time is high, check for problems with:
    • Prefetch I/O operations
    • Disk contention
    • Access path problems
    • Buffer pools that require tuning
  • If other write I/O time is high, check for problems with:
    • The I/O path
    • Disk contention
    • Buffer pools that require tuning
  • If the service task suspensions time is high, check open and close activity, and commit activity.
    Wait times for the following activities are the most common contributors to service task suspensions:
    • Phase 2 commit processing for updates, inserts, and deletes (UPDATE COMMIT - QWACAWTE). This value includes wait time for Phase 2 commit Log writes and database writes for LOB with LOG NO. For data sharing environments, it includes page P-locks unlocks for updated pages and GBP writes.
    • The OPEN/CLOSE service task. You can minimize this wait time by using two strategies. If the threshold set by the value of the DSMAX subsystem parameter is frequently reached, increase the value of the DSMAX subsystem parameter. If this threshold is reached, change CLOSE YES to CLOSE NO on data sets that are used by critical applications.
    • The SYSLGRNG recording service task.
    • The Data set extend/delete/define service task (EXT/DEL/DEF). You can minimize this wait time by defining larger primary and secondary disk space allocation for the table space.
    • Other service tasks (OTHER SERVICE TASK). Contributors to the other service tasks suspensions are likely to include time spent on the network for outgoing allied threads over TCP/IP connections, VSAM catalog updates, and parallel query cleanup. Other contributors are possible. The performance trace of the following IFCIDs provide useful information when the OTHER SERVICE TASK value is high:
      • 0170 and 0171
      • 0046, 0047, 0048, 0049, with 0050, when more detail is needed.
  • Check whether suspension times are elongated because of reasons that are usually associated with not accounted time.
    For example, a long wait time might be encountered because of a short wait to obtain a lock followed by a much longer wait to be re-dispatched, because of the CPU load. In such cases, the entire wait might be recorded as class 3 time.