© 2002 International Business Machines Corporation. All rights reserved.
The purpose of this document is to simplify the complexity of database tuning for optimum performance down to a set of steps and guidelines for a new user of IBM DB2 ® Universal Database. Fuller discussion of these subjects can be found in the DB2 Administration Guide and in third-party books about database design and performance.
Database performance is an important factor in an application's performance. As the application and its associated data changes over time, the database has to be constantly tuned to perform at its optimal level. However, efforts spent on tuning should be kept within a reasonable limit; there is a point where additional tuning efforts will only produce negligible effects. If the application performance is still not satisfactory, then other changes should be considered, such as moving the application to a faster platform.
Database design considerations
Database tuning starts at the design phase. Assuming that the hardware is chosen based on other considerations, the first decision is the storage architecture. When DB2 uses more and faster drives, the performance potentially will be better. The placement of tablespaces and other objects (logs, backup files, etc.) should be carefully planned. It is especially important to try to keep logs and backups on separate drives from the data, not only for performance, but for recovery reasons as well.
Tablespace design is an important part of the overall database design. Creating more than one user tablespace can enhance performance. There are three ways that multiple tablespaces can be useful:
- Controlling I/O, if the tablespaces can be located on different drives
- Using different pagesizes
- Controlling bufferpools
Most frequently, separate tablespaces are created for indexes and large objects. There is no advantage in creating more than one temporary tablespace with the same pagesize.
Database-managed tablespaces (especially on raw devices) provide better performance than system managed tablespaces. In determining the pagesize, keep in mind that DB2 will not put more than 255 rows on a page and the rest of the space will be unused. For example, if the average row length is 50 bytes, a maximum of 50*255=12750 bytes of the page will be used. If this table is in a tablespace of 16K or 32K pages, some of the pages will be wasted. On the other hand, tables with longer rows, or with many columns, (see the
CREATE TABLE
statement in the SQL Reference manual for limitations) require pagesizes larger than 4K. If the data is usually accessed in a sequential fashion (for example, clustered tables) larger pagesizes give a better performance, while for random access the smallest acceptable pagesize is the best.
Each tablespace is associated with a bufferpool of the same pagesize (one bufferpool can be associated with more than one tablespace). Care must be taken when using multiple bufferpools. Since available storage is finite, overallocating one bufferpool will reduce the size of the others and thus lead to decreased overall performance. Bufferpool tuning is best done based on monitoring database performance and benchmarking. DB2 does a good job of dynamically managing the available space, so in most cases working with the minimum number of bufferpools gives a good performance.
The great importance of table design has long been normalization. Non-redundant data takes up the minimum space and has the greatest integrity. However, it does not provide the best performance. Eliminating slight redundancies by creating an extra table will make it necessary to add an additional join to queries, thus increasing query complexity. Good judgment needs to be employed balancing the two requirements. Increasing performance is usually done by generating redundant data, albeit in a controlled fashion, in the form of indices and summary tables. The latter can increase performance significantly when summary data is accessed often. Evaluate the refresh frequency needed based on how current the information must be.
Indices are some of the most important aspects of performance tuning. Tables are usually accessed based on a few criteria. Building indices based on the columns that constitute these criteria can dramatically decrease the cost associated with a query. On the other hand, each index will have to be maintained along with the table. Volatile tables maintained online should have few (one or two) indices created over them, while a large historical table queried many different ways will perform better with many indices. The number of columns in an index should be kept to a minimum unless many queries can be satisfied with an "index only" search. For this purpose, the
INCLUDE
option allows appending other fields to the index with less overhead than full indexing. One of the indices over a table can be chosen as a clustering index or alternatively specified on the
REORGANIZE
command. Table data will be kept close to the sequence specified by this index. This is useful when a substantial number of queries access a large number of rows based on this index. Indices are often put in their own tablespace with their own bufferpool to prevent a large amount of data pages pushing the index pages out.
Application design considerations
Application design also affects the database. The first step is to make sure that applications only request necessary work from the database manager. For example, requesting all columns by using
SELECT *
makes programming somewhat faster but slows down performance by requesting additional data movement and preventing "index only" scans. Including unnecessary clauses, such as
ORDER BY
or
DISTINCT
, is a request for additional work from the database manager. If the application works equally well with the rows in another order, sorting time can be saved.
Controlling the locking feature is important for increasing the throughput of the database. Commits are of paramount importance even with read-only transactions, since locks are held for them, too. Choosing the right isolation level is very important. Use the lowest isolation level your application can work. Repeatable Read is extremely expensive in terms of locks and reduces concurrency. Whenever a result set is not planned to be updated, include the
FOR READ ONLY
clause. This will guarantee that exclusive locks are not acquired. The
FOR UPDATE
clause will eliminate the need to re-acquire higher locks. In some circumstances, acquiring a table lock by the application ahead of a query can prevent acquiring many row locks and then escalating the lock.
Query optimization is another subject where significant resources can be saved. The optimization class is set by the
dft_queryopt
database configuration parameter and can be overridden on the
PREP
and
BIND
commands for static SQL and by the
SET CURRENT QUERY OPTIMIZATION
statement for dynamic SQL. Simple queries (single table
SELECTs
or simple joins) will be adequately optimized by class one optimization. For complex queries, class five or higher might be necessary. The
db2batch
tool can be used to evaluate time spent compiling and executing an SQL statement. When considering the results, remember that static SQL statements are typically executed many times per compilation; the same can be true for dynamic SQL since they are cached.
After creating the database and the tablespaces, the Performance Wizard can be used to set the initial database configuration. The Performance Wizard can be accessed from the Control Center. Select the database and the "Configure Performance Using Wizard" option. This will allow for faster loading of the data. After creating the data objects, the data should be loaded.
The first step to tuning is to gather statistics using the
RUNSTATS
command. The "WITH DISTRIBUTION AND INDEXES ALL" option should be specified to get a full set of statistics.
RUNSTATS
should be a regular part of database maintenance. It should be scheduled regularly based on the volatility of the database (daily, weekly, monthly). It should also run if there are major changes to the data (loading or deleting large number of rows). The statistics are used to decide which access plan is the most efficient for a query. After performing a RUNSTATS command affected packages should be rebound.
At this point the Performance Wizard should be executed again, this time specifying the database is populated. The Performance Wizard will change some of the database configuration parameters. If the questions are answered correctly, the values produced by the Performance Wizard are usually very good. The Performance Wizard gives you a chance to restore the previous configuration (if it exists) on the first screen. The explanations in the window should be read carefully. The last screen will display the old and the new database configurations, highlighting in bold all the changes.
If performance is not satisfactory during testing, the cause of the problem should be explored either with the Database System Monitor (see the System Monitor Guide and Reference for details) or narrowing the source of the problem down to certain transactions. For specific queries, the Explain Facility (see the Administration Guide for details) gives valuable information on the possible cause of the performance problem. Based on this information, changes can be made to the index structure or database parameters.
The preceding discussion highlights some of the main concerns of tuning. Tuning is an iterative process. As the data in the database and the application change over time, the performance has to be monitored and the database changed to adapt to the new needs. Changes made to the configuration parameters, or other aspects of the database, should be made singularly and benchmarked. Sometimes changes, which may intuitively appear to be correct, can have a detrimental effect on the database.
Gabor Wieser has been in the information technology business for nearly thirty years. Last year he joined the PartnerWorld for Developers organization to provide technical services and support for DB2 developers. Gabor has worked as a mainframe system programmer and later as an application designer. In the last ten years he has designed, maintained and tuned databases on various platforms. Currently he is supporting developers using DB2 UDB on Windows/UNIX and DB2/400 on the iSeries. His main areas of expertise are database administration, extenders and system interconnectivity. Gabor can be reached at gaborw@us.ibm.com .




