When you encounter a possible CPU performance regression
during migration to a new release, the first challenges are to verify
that a CPU regression actually occurred, and to identify the specific
connection types, plans, and packages that are involved.
About this task
To do so, you need to find valid comparison points in
real production environments, both before and after migration to the
new release. The best approach is to exclude batch processing because
it is highly variable based on the based on the operational business
calendar.
You can then compare the performance data for the
period on the previous Db2 release
to the corresponding period the new Db2 release.
As a starting point, you can use a combination of statistics trace
data, accounting trace data, and workload indicators to ensure that
you have a valid comparison, and to identify the nature of the problem.
Procedure
To determine whether a CPU regression occurred at migration:
- Find an interval of several days that has a comparable SQL profile across Db2 releases.
For a valid comparison,
you need a corresponding interval that has a similar number of total
SQL requests, and a similar distribution across the different types
of SQL statements, such as SELECT, INSERT, UPDATE, DELETE, and so
on.
If you find that the SQL profile is changed significantly, the
application workload has changed and a valid comparison for CPU regression
is not possible.
- Compare performance data for the identified period in the
previous release to the corresponding period in the new release.
You can use a combination of the statistics and accounting traces
to check that you have the same pattern across Db2 releases.
- In the statistics trace data, start by comparing the
CPU times for the following contexts:
- Task control blocks (TCB) and service request blocks (SRB) for
the system services address space (ssnmMSTR).
- Task control blocks (TCB), service request blocks (SRB), and specialty
engine service request blocks for the database services address space (ssnmDBM1).
- Task control blocks (TCB) and service request blocks (SRB) for
the IRLM address space.
- In the accounting trace data, compare the class 2 CPU
times for each connection type, for central processors and for specialty
engines, and check for the numbers of SQL requests, including the
following workload indicators:
- The numbers of SQL statements for data manipulation, by type of
statement (SELECT, INSERT, UPDATE, FETCH, and so on.)
- The numbers of commit operations, roll back operations, getpage
operations, and buffer pool updates.
- The amount of read and write activity in terms of I/O operations
and pages.
- Combine the statistics trace data and accounting trace
data:
- Normalize the values by dividing the CPU time values by the number
of commit and roll back operations. The resulting values represent
the average
CPU milliseconds per transaction.
- Stack the various components of CPU resource consumption and graph
them.
For example:
MSTR TCB cpu-time / (commits + rollbacks)
MSTR SRB cpu-time / (commits + rollbacks)
DBM1 TCB cpu-time / (commits + rollbacks)
DBM1 SRB cpu-time / (commits + rollbacks)
DBM1 IIP SRB cpu-time / (commits + rollbacks)
IRLM TCB cpu-time / (commits + rollbacks)
IRLM SRB cpu-time / (commits + rollbacks)
Average Class 2 CP CPU * occurrences / (commits + rollbacks)
Average Class 2 SE CPU * occurrences / (commits + rollbacks)
- Compare the number of getpage operations for the corresponding
intervals.
If you find significant changes to the
numbers of getpage operations across releases (for comparable application
workloads), access path changes are the most likely cause of the CPU
regression.
What to do next
You can analyze the details of the accounting data to locate
the particular plan or package that is the source of the problem.