Reprinted with permission from IBM data magazine.
To help DB2 DBAs avoid performance fires and achieve high performance on their own, I've summarized a troubleshooting process for our clients, customers, and fellow DB2 professionals. The top 10 performance tips for e-business OLTP applications in DB2 UDB for UNIX, Windows, and OS/2 environments are explained in detail below -- and summarized at the end of this article.
10. Monitor switches
Make sure the monitor switches are turned on. If they aren't, you won't have access to the performance information you need. To turn the monitor switches on, issue the command:
db2 "update monitor switches using lock ON sort ON bufferpool ON uow ON table ON statement ON"
Make sure there are enough DB2 agents to handle the workload. To find out, issue the command:
db2 "get snapshot for database manager"
and look for the following lines:
High water mark for agents registered = 7 High water mark for agents waiting for a token = 0 Agents registered= 7 Agents waiting for a token= 0 Idle agents= 5 Agents assigned from pool= 158 Agents created from empty Pool = 7 Agents stolen from another application= 0 High water mark for coordinating agents= 7 Max agents overflow= 0
If you find either
Agents waiting for a token or
Agents stolen from another application,
increase the number of agents available to the database manager
MAX_COORDAGENTS as appropriate).
8. Maximum open files
DB2 tries to be a good citizen within the constraints of operating system
resources. One of its "good citizen" acts is putting a ceiling, or upper
limit, on the maximum number of files open at any one time. The
MAXFILOP database configuration parameter
stipulates the maximum number of files that DB2 can have open
concurrently. After it reaches that point, DB2 will start closing and
opening its tablespace files (including raw devices). Opening and closing
files slows SQL response times and burns CPU cycles. To find out if DB2 is
closing files, issue the command:
db2 "get snapshot for database on DBNAME"
and look for the line that reads:
Database files closed = 0
If files are being closed, increase the value of MAXFILOP until the opening and closing stops. Use the command:
db2 "update db cfg for DBNAME using MAXFILOP N"
The default value for
LOCKTIMEOUT is -1, which
means that there will be no lock timeouts - a situation that can be
catastrophic for OLTP applications. Nevertheless, I all too frequently
find many DB2 users with
LOCKTIMEOUT = -1. Set
LOCKTIMEOUT to a very short value, such as 10
or 15 seconds. Waiting on locks for extended periods of time can have an
avalanche effect on locks.
First, check the value of
LOCKTIMEOUT with this
db2 "get db cfg for DBNAME"
and look for the line containing this text:
Lock timeout (sec) (LOCKTIMEOUT) = -1
If the value is -1, consider changing it to 15 seconds by using the following command (be sure to consult with the application developers or your vendor first to make sure the application is prepared to handle lock timeouts):
db2 "update db cfg for DBNAME using LOCKTIMEOUT 15"
You should also monitor the number of lock waits, lock wait time, and amount of lock list memory in use. Issue the command:
db2 "get snapshot for database on DBNAME"
Look for the following lines:
Locks held currently= 0 Lock waits= 0 Time database waited on locks (ms)= 0 Lock list memory in use (Bytes)= 576 Deadlocks detected= 0 Lock escalations= 0 Exclusive lock escalations= 0 Agents currently waiting on locks= 0 Lock Timeouts= 0
Lock list memory in use (Bytes) exceeds
50 percent of the defined
LOCKLIST size, then
increase the number of 4K pages in the
6. Temporary tablespaces
Temporary tablespaces should have at least three containers on three
different disk drives in order to help DB2 perform parallel I/O and
improve performance for sorts, hash joins, and other database activities
To find out how many containers your temporary tablespace has, issue the command:
db2 "list tablespaces show detail"
Look for the
TEMPSPACE tablespace definition
similar to this example:
Tablespace ID= 1 Name= TEMPSPACE1 Type= System managed space Contents= Temporary data State= 0x0000 Detailed explanation:Normal Total pages= 1 Useable pages= 1 Used pages= 1 Free pages= Not applicable High water mark (pages)= Not applicable Page size (bytes)= 4096 Extent size (pages)= 32 Prefetch size (pages)= 96 Number of containers= 3
Number of containers has the value
3, and that
Prefetch size is three times
Extent size. For best parallel I/O performance,
it is important for Prefetch size to be a multiple of Extent size. The
multiple should be equal to the number of containers.
To find the definitions for the containers, issue the command:
db2 "list tablespace containers for 1 show detail"
The 1 refers to
tablespace ID #1, which is
TEMPSPACE1 in the example just given.
5. Sort memory
OLTP applications should not be performing large sorts. They are too costly
in terms of CPU, I/O, and elapsed time and will slow down any OLTP
application. Therefore, the default
size of 256 4K pages (1MB) should be more than adequate. You should also
know the number of sort overflows and the number of sorts per
Issue the command:
Db2 "get snapshot for database on DBNAME"
and look for the following lines:
Total sort heap allocated= 0 Total sorts = 1 Total sort time (ms)= 8 Sort overflows = 0 Active sorts = 0 Commit statements attempted = 3 Rollback statements attempted = 0 Let transactions = Commit statements attempted + Rollback statements attempted Let SortsPerTX= Total sorts / transactions Let PercentSortOverflows = Sort overflows * 100 / Total sorts
PercentSortOverflows ((Sort overflows * 100) / Total
is greater than 3 percent, there may be serious and unexpected sort
problems in the application SQL. Because the very presence of overflows
indicates that large sorts are occurring, finding zero sort overflows, or
at least a percentage less than one, would be ideal.
If excessive sort overflows are present, the "band aid" solution is to
increase the size of
SORTHEAP. However, doing
so only masks the real performance problem. Instead, you should identify
the SQL that is causing the sorts and change the SQL, indexes, or
clustering to avoid or reduce the sort cost.
SortsPerTX is greater than 5 (as a rule of
thumb), the number of sorts per transaction may be high. Some application
transactions perform dozens of small composite sorts (which do not
overflow and have very short durations), but consume excessive CPU. When
SortsPerTX is high, my experience indicates
that these machines are typically CPU bound. Identifying the SQL that is
causing the sorts and improving the access plans (via indexes, clustering,
or SQL changes) is paramount to improving transaction throughput
4. Table access
For each table, identify how many rows DB2 is reading for each transaction. You must issue two commands:
db2 "get snapshot for database on DBNAME"
db2 "get snapshot for tables on DBNAME"
After you issue the first command, determine how many transactions have
occurred (by taking the sum of
attempted - see Tip 3).
After issuing the second command, divide the number of rows read by the
number of transactions (
applications should typically read one to 20 rows from each table per
transaction. If you discover that hundreds or thousands of rows are being
read for each transaction, scans are taking place and indexes may need to
be created. (Sometimes simply running runstats with distribution and
detailed indexes all provides a cure.)
Sample output from
get snapshot for tables on DBNAME"
Snapshot timestamp = 09-25-2000 4:47:09.970811 Database name= DGIDB Database path= /fs/inst1/inst1/NODE0000/SQL00001/ Input database alias= DGIDB Number of accessed tables= 8 Table List Table Schema= INST1 Table Name= DGI_ SALES_ LOGS_TB Table Type= User Rows Written= 0 Rows Read= 98857 Overflows= 0 Page Reorgs= 0
A high number of Overflows probably means you need to reorganize the table. Overflows occur when DB2 must locate a row on a suboptimal page due to a change in a row's width.
3. Tablespace analysis
A tablespace snapshot can be extremely valuable to understanding what data is being accessed and how. To get one, issue the command:
db2 "get snapshot for tablespaces on DBNAME"
For each tablespace, answer the following questions:
- What is the average read time (ms)?
- What is the average write time (ms)?
- What percentage of the physical I/O is asynchronous (prefetched) vs. synchronous (random)?
- What are the buffer pool hit ratios for each tablespace?
- How many physical pages are being read each minute?
- How many physical and logical pages are being read for each transaction?
For all tablespaces, answer the following questions:
- Which tablespaces have the slowest read and write times? Why? Containers on slow disks? Are container sizes unequal?
- Are the access attributes, asynchronous versus synchronous access, consistent with expectations? Randomly read tables should have randomly read tablespaces, meaning high synchronous read percentages, usually higher buffer pool hit ratios, and lower physical I/O rates.
For each tablespace, make sure that the prefetch size is equal to the extent size multiplied by the number of containers. Issue the command:
db2 "list tablespaces show detail"
The prefetch size can be altered for a given tablespace if necessary. Container definitions can be checked by using the command:
db2 "list tablespace containers for N show detail"
in which N is the tablespace ID number.
2. Buffer pool optimization
All too often I find DB2 UDB sites where the machines have 2, 4, or 8GB of
memory, yet the DB2 database has one buffer pool,
IBMDEFAULTBP, which is only 16MB in size!
If this is the case at your site, create a buffer pool for the SYSCATSPACE
catalog tablespace, one for the
tablespace, and at least two more buffer pools:
Tablespaces that are accessed randomly should be assigned to a buffer pool
with random objectives,
that are accessed sequentially (with asynchronous prefetch I/O) should be
assigned to a buffer pool with sequential objectives,
BP_SEQ. You can create additional buffer pools
depending on performance objectives for certain transactions; for example,
you could make a buffer pool large enough to store an entire "hot," or
very frequently accessed, table. When large tables are involved, some DB2
users have great success placing the indexes for important tables into an
BP_IX, buffer pool.
Buffer pools that are too small result in excessive, unnecessary, physical I/O. Buffer pools that are too large put a system at risk for operating system paging and consume unnecessary CPU cycles managing the overallocated memory. Somewhere between "too small" and "too large" lies the size that is just right. The right size exists where the point of diminishing returns is reached. If you're not using a tool to automate the diminishing returns analysis, you should scientifically test buffer pool performance (hit ratios, I/O times, physical I/O read rates) at incremental sizes until an optimum size is reached. Because businesses constantly change and grow, the "optimum size" decision should be reevaluated periodically.
1. SQL cost analysis
One bad SQL statement can ruin your whole day. Time and time again I've seen a single, relatively simple SQL statement bring a finely tuned database and machine to its knees. For many of these statements, there isn't a DB2 UDB configuration parameter under the sun (or in the doc) that can make right the high cost of an errant SQL statement.
Making matters worse, the DBA's hands are frequently tied: You can't change the SQL can because it's provided by an application vendor (such as SAP, PeopleSoft, or Siebel). This leaves the DBA three courses of action:
1. Change or add indexes
2. Change clustering
3. Change catalog statistics
What's more, today's robust applications are made up of hundreds or thousands of different SQL statements. These statements are executed at varying rates of frequency depending on application functionality and the business needs du jour. A SQL statement's true cost is the resource cost to execute it once multiplied by the number of times it is executed.
The monumental task that confronts each DBA is the challenge of identifying the SQL statements with the highest "true cost," and working to reduce the costs of these statements.
You can find out the resource cost to execute a SQL statement once from native DB2 Explain utilities, a number of tools from third-party vendors, or the DB2 UDB SQL Event Monitor data. But the frequency of statement execution can only be learned through careful and time-consuming analysis of DB2 UDB SQL Event Monitor data.
In researching problem SQL statements, the standard procedure used by DBAs is:
1. Create an SQL Event Monitor, write to file:
$> db2 "create event monitor SQLCOST for statements write to ..."
2. Activate the event monitor (be sure ample free disk space is available):
$> db2 "set event monitor SQLCOST state = 1"
3. Let the application run.
4. Deactivate the event monitor:
$> db2 "set event monitor SQLCOST state = 0"
5. Use the DB2-supplied db2evmon tool to format the raw SQL Event Monitor data (hundreds of megabytes of free disk space may be required depending on SQL throughput rates):
$> db2evmon -db DBNAME -evm SQLCOST > sqltrace.txt
6. Browse through the formatted file scanning for unusually large cost numbers, a time-consuming process:
$> more sqltrace.txt
7. Undertake a more complete analysis of the formatted file that attempts to identify unique statements (independent of literal values), each unique statement's frequency (how many times it occurred), and the aggregate of its total CPU, sort, and other resource costs. Such a thorough analysis could take a week or more on just a 30-minute sample of application SQL activity.
To reduce the time it takes to identify SQL statements with high costs, you can consider many sources of available information:
- From Tip 4, be sure to compute the number of rows read from each table per transaction. If the resulting number seems high, the DBA may be able to identify problem statements by searching the SQL Event Monitor formatted output for the table name in question (this will narrow the search, and possibly save some time).
- From Tip 3, be sure to compute the asynchronous read percentage and physical I/O read rates for each tablespace. If a tablespace has a very high asynchronous read percentage and way above average physical I/O read rates, one or more of the tables in the tablespace is probably being scanned. Query the catalog and find out which tables are assigned to the suspect tablespaces (one table per tablespace provides the best performance instrumentation), then search the SQL Event Monitor formatted output for the tables. This, too, may help narrow the search for costly SQL statements.
- Try to look at DB2 Explain information for each of the SQL statements being executed by the application. However, I've found that high frequency, lower-cost statements often rob a machine of its capacity and ability to provide desired performance.
- If analysis time is short and maximum performance is critical, consider vendor tools that can quickly automate the process of identifying resource intensive SQL statements. The SQL-GUY tool from Database-GUYS Inc. provides an accurate, real-time, proportional, cost rank analysis of SQL statements.
Staying in tune
Optimum performance requires not only eliminating high cost SQL statements, but also making sure that appropriate physical infrastructures are in place. Peak performance results when all the tuning knobs are set just right, memory is allocated to pools and heaps effectively, and I/O is evenly balanced across disks. Although it takes time to measure and make adjustments, the DBA who performs these 10 suggestions will be very successful at satisfying internal and external DB2 customers. As e-businesses change and grow, even the best-administered database will need regular fine-tuning. The DBA's job is never done!
The Top 10 at a glance
- Use sufficient agents for the workload.
- Do not allow DB2 to needlessly close and open files.
- Do not allow extended lock waits.
- Ensure parallel I/O capabilities to the database
- Manage DB2 sort memory conservatively and don't mask sort problems
- Analyze table access activity and identify tables with unusually high rows read per transaction or overflow counts.
- Analyze the performance characteristics of each tablespace, and seek to improve the performance of the tablespaces with the slowest read times, longest write times, highest physical I/O read rates, worst hit ratios, and access attributes that are inconsistent with expectations.
- Create multiple buffer pools, and make purposeful assignments of tablespaces to buffer pools such that access attributes are shared.
- Examine DB2 UDB SQL Event Monitor information to discover which SQL statements are consuming the largest proportions of computing resources, and take corrective actions.
- Reevaluate configuration and physical design settings once high cost SQL is eliminated.
Reprinted with permission from the Spring 2001 issue of IBM data magazine.