Monitoring for performance exceptions

By maintaining a performance database and analyzing performance trends for your Db2 subsystems, you can identify potential problem conditions before they become actual problems. By detecting such problems early, you can either prevent problems or react to and troubleshoot problems more quickly when they do occur.

About this task

You can monitor for specific exceptional values or events, such as high response times or deadlocks. Exception monitoring is most appropriate for response-time and concurrency problems.

Procedure

Use any combination of the following approaches for exception performance monitoring in your Db2 subsystems:

  • Collect statistics trace classes 1, 3, 4, 5 (for data sharing), 7 (for distributed location statistics), and 8 (for dataset statistics).
  • Set the value of the STATIME subsystem parameter to 1.
    The STATIME subsystem parameter specifies the time interval, in minutes, between statistics collections.
    Starting in DB2® 10, Db2 always uses a 1-minute interval for certain statistics values, including the following IFCIDs: 0002, 0202, 0217, 0225, and 0230.
  • Copy SMF 100 records and retain them in a separate file.
    These records represent a relatively small volume of the total SMF data volume.
  • Collect accounting trace classes 1, 2, 3, 7, and 8.
  • Use the ACCUMACC subsystem parameter to create rollup accounting records for DDF and RRS workloads.
    Be aware, however, that using the rollup accounting records provides a trade-off. The rollup records reduce the volume of the accounting data. However, they also remove granularity from the data, which means that information about outlying transactions that perform poorly is likely to be lost in the rollup data.

    Another way to reduce the volume of accounting data is to set the value of the SMFCOMP subsystem parameter to YES. This value enables Db2 SMF trace data compression. You can use it instead of ACCUMACC to reduce the volume of trace data without removing the granularity of the data. You can also use SMFCOMP=YES and ACCUMACC=YES together to get even more SMF trace volume reduction.

  • Monitor and review the Db2 metrics proactively and regularly, and develop an automated process to store the performance data into Db2 tables.
    The goal is to track evolving trends for key performance indicators at the system level from the time that Db2 starts to the time that it stops. This information becomes the baseline for further analysis and enables you to establish thresholds for out-of-normal conditions and alerts when these conditions occur.
  • Enable near-term history collection in your Db2 online monitor.
    By doing so, you can review Db2 statistics and accounting records for the past several hours of Db2 processing. You can use this information to detect adverse trends. You can keep a log and history of the alerts that are generated, and analyze the trends.
  • Use automation to issue the DISPLAY commands at regular intervals. Save the output from these commands so that you can analyze what happened on the system before the problem occurred.
    You might also use automation to specify thresholds for the output of the commands to detect exception conditions. The objective is to be able to detect and correct problems quickly, thereby avoiding long-running recovery actions whenever possible.
    For example, you might use automation to issue the following DISPLAY commands at 15-minute intervals:
    -DISPLAY THREAD(*) SCOPE(GROUP) TYPE(INDOUBT)
    -DISPLAY DATABASE(*) SPACENAM(*) RESTRICT LIMIT(*)
    -DISPLAY UTIL(*)
    -DISPLAY THREAD(*) TYPE(SYSTEM)
    -DISPLAY THREAD(*) SERVICE(STORAGE)
    -DISPLAY BPOOL(*) DETAIL(INTERVAL)
    -DISPLAY GBPOOL(*) MDETAIL(INTERVAL)
    -DISPLAY GROUP
    -DISPLAY ARCHIVE
    -DISPLAY DDF
    -DISPLAY LOCATION(*) DETAIL
    F irlmproc,STATUS, ALLD
    D XCF, STR
    D GRS,C
  • Use the dynamic statement cache to monitor for exceptions for distributed applications, such as .NET, ODBC, and JDBC applications.
    For example, you might use the following process to capture the information:
    1. Create the following Db2-supplied user tables:

      You can find a sample CREATE TABLE statement for each EXPLAIN table in member DSNTESC of the prefix.SDSNSAMP library. You can call the ADMIN_EXPLAIN_MAINT stored procedure to create EXPLAIN tables, upgrade them to the format for the current Db2 release, or complete other maintenance tasks. See ADMIN_EXPLAIN_MAINT stored procedure for information about using the action input parameter to request each of these tasks.

    2. Issue the following command to start collecting performance statistics for the dynamic statement cache:
      -START TRACE(P) CLASS(30) IFCID(316, 317, 318) 
    3. Issue the following statement to extract the statistics information from the global statement cache to the DSN_STATEMENT_CACHE_TABLE.
      EXPLAIN STMTCACHE ALL;
    4. Stop the performance trace.
    5. Issue the following statement to generate individual EXPLAIN statements for each SQL statement in the cache.
      SELECT 'EXPLAIN STMTCACHE STMTID '||STRIP(CHAR(STMT_ID))||' ;'
      FROM USERID.DSN_STATEMENT_CACHE_TABLE;
    6. Import the contents of the four tables into a spreadsheet for analysis.