Investigating CPU performance regression

When you encounter a possible CPU performance regression during migration to a new release, the first challenges are to verify that a CPU regression actually occurred, and to identify the specific connection types, plans, and packages that are involved.

About this task

To do so, you need to find valid comparison points in real production environments, both before and after migration to the new release. The best approach is to exclude batch processing because it is highly variable based on the based on the operational business calendar.

You can then compare the performance data for the period on the previous Db2 release to the corresponding period the new Db2 release. As a starting point, you can use a combination of statistics trace data, accounting trace data, and workload indicators to ensure that you have a valid comparison, and to identify the nature of the problem.

Procedure

To determine whether a CPU regression occurred at migration:

  1. Find an interval of several days that has a comparable SQL profile across Db2 releases.
    For a valid comparison, you need a corresponding interval that has a similar number of total SQL requests, and a similar distribution across the different types of SQL statements, such as SELECT, INSERT, UPDATE, DELETE, and so on.

    If you find that the SQL profile is changed significantly, the application workload has changed and a valid comparison for CPU regression is not possible.

  2. Compare performance data for the identified period in the previous release to the corresponding period in the new release.
    You can use a combination of the statistics and accounting traces to check that you have the same pattern across Db2 releases.
    1. In the statistics trace data, start by comparing the CPU times for the following contexts:
      • Task control blocks (TCB) and service request blocks (SRB) for the system services address space (ssnmMSTR).
      • Task control blocks (TCB), service request blocks (SRB), and specialty engine service request blocks for the database services address space (ssnmDBM1).
      • Task control blocks (TCB) and service request blocks (SRB) for the IRLM address space.
    2. In the accounting trace data, compare the class 2 CPU times for each connection type, for central processors and for specialty engines, and check for the numbers of SQL requests, including the following workload indicators:
      • The numbers of SQL statements for data manipulation, by type of statement (SELECT, INSERT, UPDATE, FETCH, and so on.)
      • The numbers of commit operations, roll back operations, getpage operations, and buffer pool updates.
      • The amount of read and write activity in terms of I/O operations and pages.
    3. Combine the statistics trace data and accounting trace data:
      1. Normalize the values by dividing the CPU time values by the number of commit and roll back operations. The resulting values represent the average CPU milliseconds per transaction.
      2. Stack the various components of CPU resource consumption and graph them.
      For example:
      MSTR TCB cpu-time / (commits + rollbacks)
      MSTR SRB cpu-time / (commits + rollbacks)
      DBM1 TCB cpu-time / (commits + rollbacks)
      DBM1 SRB cpu-time / (commits + rollbacks)
      DBM1 IIP SRB cpu-time / (commits + rollbacks)
      IRLM TCB cpu-time / (commits + rollbacks)
      IRLM SRB cpu-time / (commits + rollbacks)
      Average Class 2 CP CPU * occurrences / (commits + rollbacks)
      Average Class 2 SE CPU * occurrences / (commits + rollbacks)
  3. Compare the number of getpage operations for the corresponding intervals.
    If you find significant changes to the numbers of getpage operations across releases (for comparable application workloads), access path changes are the most likely cause of the CPU regression.

What to do next

You can analyze the details of the accounting data to locate the particular plan or package that is the source of the problem.