Every few weeks or so, we get performance calls from DBAs in distress. "Our Web site is crawling," they bemoan. "We're losing customers, and the situation is dire. Can you help?" To answer these calls, I've developed an analytics process for my consulting company that lets us rapidly find the cause of performance problems and develop remedial actions and tuning suggestions. Rarely do these callers inquire about fees or costs - they're only interested in stopping the bleeding. When a DB2 or e-business application isn't performing up to expectations, the entire organization and financial bottom line measurably suffers.

Scott Hayes (scott@database-guys.com), President, Database-Guys Inc.

Scott Hayes is president of Database-Guys Inc., a performance tools and consulting company specializing in DB2 UDB on UNIX platforms. He is also a member of the IBM DB2 Gold Consultant Group and a DB2 UDB Advanced Certified Technical Expert. Scott has worked with DB2/AIX since V1, and his diverse experiences have taught him a number of DB2 UDB best practices. A frequent speaker at IDUG and DB2 technical conferences, Scott has published a number of articles in DB2-related magazines. For more information, visit www.database-guys.com or write scott@database-guys.com.



09 March 2004

Reprinted with permission from IBM data magazine.

To help DB2 DBAs avoid performance fires and achieve high performance on their own, I've summarized a troubleshooting process for our clients, customers, and fellow DB2 professionals. The top 10 performance tips for e-business OLTP applications in DB2 UDB for UNIX, Windows, and OS/2 environments are explained in detail below -- and summarized at the end of this article.

10. Monitor switches

Make sure the monitor switches are turned on. If they aren't, you won't have access to the performance information you need. To turn the monitor switches on, issue the command:

db2 "update monitor switches using    
lock ON sort ON bufferpool ON uow ON    
table ON statement ON"

9. Agents

Make sure there are enough DB2 agents to handle the workload. To find out, issue the command:

db2 "get snapshot for database manager"

and look for the following lines:

High water mark for agents registered = 7   
High water mark for agents waiting for a token = 0   
Agents registered= 7   
Agents waiting for a token= 0   
Idle agents= 5   
Agents assigned from pool= 158   
Agents created from empty Pool = 7   
Agents stolen from another application= 0   
High water mark for coordinating agents= 7   
Max agents overflow= 0

If you find either Agents waiting for a token or Agents stolen from another application, increase the number of agents available to the database manager (MAXAGENTS and/or MAX_COORDAGENTS as appropriate).


8. Maximum open files

DB2 tries to be a good citizen within the constraints of operating system resources. One of its "good citizen" acts is putting a ceiling, or upper limit, on the maximum number of files open at any one time. The MAXFILOP database configuration parameter stipulates the maximum number of files that DB2 can have open concurrently. After it reaches that point, DB2 will start closing and opening its tablespace files (including raw devices). Opening and closing files slows SQL response times and burns CPU cycles. To find out if DB2 is closing files, issue the command:

db2 "get snapshot for database on DBNAME"

and look for the line that reads:

Database files closed = 0

If files are being closed, increase the value of MAXFILOP until the opening and closing stops. Use the command:

db2 "update db cfg for DBNAME using MAXFILOP N"

7. Locks

The default value for LOCKTIMEOUT is -1, which means that there will be no lock timeouts - a situation that can be catastrophic for OLTP applications. Nevertheless, I all too frequently find many DB2 users with LOCKTIMEOUT = -1. Set LOCKTIMEOUT to a very short value, such as 10 or 15 seconds. Waiting on locks for extended periods of time can have an avalanche effect on locks.

First, check the value of LOCKTIMEOUT with this command:

db2 "get db cfg for DBNAME"

and look for the line containing this text:

Lock timeout (sec) (LOCKTIMEOUT) = -1

If the value is -1, consider changing it to 15 seconds by using the following command (be sure to consult with the application developers or your vendor first to make sure the application is prepared to handle lock timeouts):

db2 "update db cfg for DBNAME using LOCKTIMEOUT 15"

You should also monitor the number of lock waits, lock wait time, and amount of lock list memory in use. Issue the command:

db2 "get snapshot for database on DBNAME"

Look for the following lines:

Locks held currently= 0   
Lock waits= 0   
Time database waited on locks (ms)= 0   
Lock list memory in use (Bytes)= 576   
Deadlocks detected= 0   
Lock escalations= 0   
Exclusive lock escalations= 0   
Agents currently waiting on locks= 0   
Lock Timeouts= 0

If the Lock list memory in use (Bytes) exceeds 50 percent of the defined LOCKLIST size, then increase the number of 4K pages in the LOCKLIST database configuration.


6. Temporary tablespaces

Temporary tablespaces should have at least three containers on three different disk drives in order to help DB2 perform parallel I/O and improve performance for sorts, hash joins, and other database activities that use TEMPSPACE.

To find out how many containers your temporary tablespace has, issue the command:

db2 "list tablespaces show detail"

Look for the TEMPSPACE tablespace definition similar to this example:

Tablespace ID= 1   
Name= TEMPSPACE1   
Type= System managed space   
Contents= Temporary data   
State= 0x0000    
  Detailed explanation:Normal   
Total pages= 1   
Useable pages= 1   
Used pages= 1   
Free pages= Not applicable   
High water mark (pages)= Not applicable   
Page size (bytes)= 4096   
Extent size (pages)= 32   
Prefetch size (pages)= 96   
Number of containers= 3

Notice that Number of containers has the value 3, and that Prefetch size is three times Extent size. For best parallel I/O performance, it is important for Prefetch size to be a multiple of Extent size. The multiple should be equal to the number of containers.

To find the definitions for the containers, issue the command:

db2 "list tablespace containers for 1 show detail"

The 1 refers to tablespace ID #1, which is TEMPSPACE1 in the example just given.


5. Sort memory

OLTP applications should not be performing large sorts. They are too costly in terms of CPU, I/O, and elapsed time and will slow down any OLTP application. Therefore, the default SORTHEAP size of 256 4K pages (1MB) should be more than adequate. You should also know the number of sort overflows and the number of sorts per transaction.

Issue the command:

Db2 "get snapshot for database on DBNAME"

and look for the following lines:

Total sort heap allocated= 0   
Total sorts = 1   
Total sort time (ms)= 8   
Sort overflows = 0   
Active sorts = 0   
Commit statements attempted = 3   
Rollback statements attempted = 0   
Let transactions = Commit statements attempted + Rollback   
statements attempted   
Let SortsPerTX= Total sorts / transactions   
Let PercentSortOverflows = Sort overflows * 100 / Total sorts

If PercentSortOverflows ((Sort overflows * 100) / Total sorts ) is greater than 3 percent, there may be serious and unexpected sort problems in the application SQL. Because the very presence of overflows indicates that large sorts are occurring, finding zero sort overflows, or at least a percentage less than one, would be ideal.

If excessive sort overflows are present, the "band aid" solution is to increase the size of SORTHEAP. However, doing so only masks the real performance problem. Instead, you should identify the SQL that is causing the sorts and change the SQL, indexes, or clustering to avoid or reduce the sort cost.

If SortsPerTX is greater than 5 (as a rule of thumb), the number of sorts per transaction may be high. Some application transactions perform dozens of small composite sorts (which do not overflow and have very short durations), but consume excessive CPU. When SortsPerTX is high, my experience indicates that these machines are typically CPU bound. Identifying the SQL that is causing the sorts and improving the access plans (via indexes, clustering, or SQL changes) is paramount to improving transaction throughput rates.


4. Table access

For each table, identify how many rows DB2 is reading for each transaction. You must issue two commands:

db2 "get snapshot for database on DBNAME"
db2 "get snapshot for tables on DBNAME"

After you issue the first command, determine how many transactions have occurred (by taking the sum of Commit statements attempted plus Rollback statements attempted - see Tip 3).

After issuing the second command, divide the number of rows read by the number of transactions (RowsPerTX). OLTP applications should typically read one to 20 rows from each table per transaction. If you discover that hundreds or thousands of rows are being read for each transaction, scans are taking place and indexes may need to be created. (Sometimes simply running runstats with distribution and detailed indexes all provides a cure.)

Sample output from "get snapshot for tables on DBNAME" follows:

Snapshot timestamp = 09-25-2000    
4:47:09.970811   
Database name= DGIDB   
Database path= /fs/inst1/inst1/NODE0000/SQL00001/   
Input database alias= DGIDB   
Number of accessed tables= 8   
Table List    
  Table Schema= INST1    
  Table Name= DGI_    
SALES_ LOGS_TB    
  Table Type= User    
  Rows Written= 0    
  Rows Read= 98857    
  Overflows= 0      
  Page Reorgs= 0

A high number of Overflows probably means you need to reorganize the table. Overflows occur when DB2 must locate a row on a suboptimal page due to a change in a row's width.


3. Tablespace analysis

A tablespace snapshot can be extremely valuable to understanding what data is being accessed and how. To get one, issue the command:

db2 "get snapshot for tablespaces on DBNAME"

For each tablespace, answer the following questions:

  • What is the average read time (ms)?
  • What is the average write time (ms)?
  • What percentage of the physical I/O is asynchronous (prefetched) vs. synchronous (random)?
  • What are the buffer pool hit ratios for each tablespace?
  • How many physical pages are being read each minute?
  • How many physical and logical pages are being read for each transaction?

For all tablespaces, answer the following questions:

  • Which tablespaces have the slowest read and write times? Why? Containers on slow disks? Are container sizes unequal?
  • Are the access attributes, asynchronous versus synchronous access, consistent with expectations? Randomly read tables should have randomly read tablespaces, meaning high synchronous read percentages, usually higher buffer pool hit ratios, and lower physical I/O rates.

For each tablespace, make sure that the prefetch size is equal to the extent size multiplied by the number of containers. Issue the command:

db2 "list tablespaces show detail"

The prefetch size can be altered for a given tablespace if necessary. Container definitions can be checked by using the command:

db2 "list tablespace containers for N show detail"

in which N is the tablespace ID number.


2. Buffer pool optimization

All too often I find DB2 UDB sites where the machines have 2, 4, or 8GB of memory, yet the DB2 database has one buffer pool, IBMDEFAULTBP, which is only 16MB in size!

If this is the case at your site, create a buffer pool for the SYSCATSPACE catalog tablespace, one for the TEMPSPACE tablespace, and at least two more buffer pools: BP_RAND and BP_SEQ. Tablespaces that are accessed randomly should be assigned to a buffer pool with random objectives, BP_RAND. Tablespaces that are accessed sequentially (with asynchronous prefetch I/O) should be assigned to a buffer pool with sequential objectives, BP_SEQ. You can create additional buffer pools depending on performance objectives for certain transactions; for example, you could make a buffer pool large enough to store an entire "hot," or very frequently accessed, table. When large tables are involved, some DB2 users have great success placing the indexes for important tables into an index, BP_IX, buffer pool.

Buffer pools that are too small result in excessive, unnecessary, physical I/O. Buffer pools that are too large put a system at risk for operating system paging and consume unnecessary CPU cycles managing the overallocated memory. Somewhere between "too small" and "too large" lies the size that is just right. The right size exists where the point of diminishing returns is reached. If you're not using a tool to automate the diminishing returns analysis, you should scientifically test buffer pool performance (hit ratios, I/O times, physical I/O read rates) at incremental sizes until an optimum size is reached. Because businesses constantly change and grow, the "optimum size" decision should be reevaluated periodically.


1. SQL cost analysis

One bad SQL statement can ruin your whole day. Time and time again I've seen a single, relatively simple SQL statement bring a finely tuned database and machine to its knees. For many of these statements, there isn't a DB2 UDB configuration parameter under the sun (or in the doc) that can make right the high cost of an errant SQL statement.

Making matters worse, the DBA's hands are frequently tied: You can't change the SQL can because it's provided by an application vendor (such as SAP, PeopleSoft, or Siebel). This leaves the DBA three courses of action:

1. Change or add indexes

2. Change clustering

3. Change catalog statistics

What's more, today's robust applications are made up of hundreds or thousands of different SQL statements. These statements are executed at varying rates of frequency depending on application functionality and the business needs du jour. A SQL statement's true cost is the resource cost to execute it once multiplied by the number of times it is executed.

The monumental task that confronts each DBA is the challenge of identifying the SQL statements with the highest "true cost," and working to reduce the costs of these statements.

You can find out the resource cost to execute a SQL statement once from native DB2 Explain utilities, a number of tools from third-party vendors, or the DB2 UDB SQL Event Monitor data. But the frequency of statement execution can only be learned through careful and time-consuming analysis of DB2 UDB SQL Event Monitor data.

In researching problem SQL statements, the standard procedure used by DBAs is:

1. Create an SQL Event Monitor, write to file:

$> db2 "create event monitor SQLCOST for statements write to ..."

2. Activate the event monitor (be sure ample free disk space is available):

$> db2 "set event monitor SQLCOST state = 1"

3. Let the application run.

4. Deactivate the event monitor:

$> db2 "set event monitor SQLCOST state = 0"

5. Use the DB2-supplied db2evmon tool to format the raw SQL Event Monitor data (hundreds of megabytes of free disk space may be required depending on SQL throughput rates):

$> db2evmon -db DBNAME -evm SQLCOST   > sqltrace.txt

6. Browse through the formatted file scanning for unusually large cost numbers, a time-consuming process:

$> more sqltrace.txt

7. Undertake a more complete analysis of the formatted file that attempts to identify unique statements (independent of literal values), each unique statement's frequency (how many times it occurred), and the aggregate of its total CPU, sort, and other resource costs. Such a thorough analysis could take a week or more on just a 30-minute sample of application SQL activity.

To reduce the time it takes to identify SQL statements with high costs, you can consider many sources of available information:

  • From Tip 4, be sure to compute the number of rows read from each table per transaction. If the resulting number seems high, the DBA may be able to identify problem statements by searching the SQL Event Monitor formatted output for the table name in question (this will narrow the search, and possibly save some time).
  • From Tip 3, be sure to compute the asynchronous read percentage and physical I/O read rates for each tablespace. If a tablespace has a very high asynchronous read percentage and way above average physical I/O read rates, one or more of the tables in the tablespace is probably being scanned. Query the catalog and find out which tables are assigned to the suspect tablespaces (one table per tablespace provides the best performance instrumentation), then search the SQL Event Monitor formatted output for the tables. This, too, may help narrow the search for costly SQL statements.
  • Try to look at DB2 Explain information for each of the SQL statements being executed by the application. However, I've found that high frequency, lower-cost statements often rob a machine of its capacity and ability to provide desired performance.
  • If analysis time is short and maximum performance is critical, consider vendor tools that can quickly automate the process of identifying resource intensive SQL statements. The SQL-GUY tool from Database-GUYS Inc. provides an accurate, real-time, proportional, cost rank analysis of SQL statements.

Staying in tune

Optimum performance requires not only eliminating high cost SQL statements, but also making sure that appropriate physical infrastructures are in place. Peak performance results when all the tuning knobs are set just right, memory is allocated to pools and heaps effectively, and I/O is evenly balanced across disks. Although it takes time to measure and make adjustments, the DBA who performs these 10 suggestions will be very successful at satisfying internal and external DB2 customers. As e-businesses change and grow, even the best-administered database will need regular fine-tuning. The DBA's job is never done!


The Top 10 at a glance

  1. Use sufficient agents for the workload.
  2. Do not allow DB2 to needlessly close and open files.
  3. Do not allow extended lock waits.
  4. Ensure parallel I/O capabilities to the database TEMPSPACE tablespace.
  5. Manage DB2 sort memory conservatively and don't mask sort problems with large SORTHEAPs.
  6. Analyze table access activity and identify tables with unusually high rows read per transaction or overflow counts.
  7. Analyze the performance characteristics of each tablespace, and seek to improve the performance of the tablespaces with the slowest read times, longest write times, highest physical I/O read rates, worst hit ratios, and access attributes that are inconsistent with expectations.
  8. Create multiple buffer pools, and make purposeful assignments of tablespaces to buffer pools such that access attributes are shared.
  9. Examine DB2 UDB SQL Event Monitor information to discover which SQL statements are consuming the largest proportions of computing resources, and take corrective actions.
  10. Reevaluate configuration and physical design settings once high cost SQL is eliminated.

Reprinted with permission from the Spring 2001 issue of IBM data magazine.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=14294
ArticleTitle=Top 10 performance tips
publish-date=03092004