When performance problems arise, the problem is either related to a system resource issue or an application issue (or both). When it comes to system resource usage, AIX provides performance tuning parameters to achieve optimal resource usage for your business objectives. From an application perspective, DB2 is considered middle tier yet at the same time runs within application process space. Like AIX it provides the capability to monitor performance from both an application and SQL statement perspective, providing a wealth of tuning data. Some tuning parameters can be reviewed on their own and changed based on known workload requirements, and others must be considered as part of a larger picture. This article shows you how to recognize problems, and analyze and tune performance from both an AIX and DB2 perspective.
Often within a DB2 AIX environment, performance can be attributed to two areas within the system: memory and disk I/O. From an operating system perspective, excessive paging generally indicates insufficient amount of memory in the system. For most cases the easiest fix is to add more memory. As far as poor disk I/O, there are several things you can do within AIX to improve performance, which is what makes AIX unique from other operating systems. The I/O settings you are able to adjust within AIX for performance will compliment the performance gains you will be able to obtain through DB2. You can exceptionally improve database performance within a DB2 AIX environment. On the other hand, within DB2 poor disk I/O can be attributed to either poor tablespace layout or inadequate memory utilization. But, improving tablespace performance within DB2 is rare since it would mean a redesign of your database, which is not always feasible. Good memory management within DB2 would improve disk I/O performance without a redesign of your tablespace, as you will see later.
Often within DB2 poor performance can be attributed to large amounts of disk I/O. Minimizing the number of times DB2 has to retrieve data from disk increases performance. (Some disk I/O, such as logging, is unavoidable.) Consequently, DB2 uses buffer pools to improve performance. A buffer pool is a finite area in memory that DB2 uses as a staging area for data for transactional processing, and for reading and writing data to and from disk. The configuration parameters involved with tuning the buffer pool are:
- BUFFPAGE
- PREFETCHSIZE
- NUM_IOCLEANERS
- CHNGPGS_THRESH
- NUM_IOSERVERS
Before tuning the buffer pool, you must first analyze what activity is occurring in the buffer pool and record the relevant values. Then you will have an indicator to measure the performance results of any changes made to the buffer pool parameters. DB2 has the snapshot monitor to capture realtime information on buffer pool utilization. This displays all activity related to pages being written to and read from memory. Before getting snapshot data for buffer pool, you must first turn on the monitoring switch by issuing:
db2 update monitor switches using BUFFERPOOL on |
which will yield:
db2mn@:/home/db2mn> db2 get monitor switches
Monitor Recording Switches
Switch list for node 0
Buffer Pool Activity Information (BUFFERPOOL) = OFF
Lock Information (LOCK) = OFF
Sorting Information (SORT) = OFF
SQL Statement Information (STATEMENT) = OFF
Table Activity Information (TABLE) = OFF
Unit of Work Information (UOW) = OFF
db2mn@:/home/db2mn> db2 update monitor switches using buffer pool on
DB20000I The UPDATE MONITOR SWITCHES command completed successfully.
db2mn@:/home/db2mn> db2 get monitor switches
Monitor Recording Switches
Switch list for node 0
Buffer Pool Activity Information (BUFFERPOOL) = ON 10-03-2003
14:26:40.173684
Lock Information (LOCK) = OFF
Sorting Information (SORT) = OFF
SQL Statement Information (STATEMENT) = OFF
Table Activity Information (TABLE) = OFF
Unit of Work Information (UOW) = OFF
Db2mn@:/home/db2mn>
|
Notice that before running the update monitor switches command, a query was executed to get the current monitor switch settings:
db2 get monitor switches |
After issuing the update monitor switches command, the BUFFERPOOL is set to ON with the current date and timestamp. This switch will stay on until you explicitly turn it off with the update command, or start and restart the database with a db2stop and db2start respectfully.
To obtain buffer pool information you must run the following command:
db2 get snapshot for all bufferpools |
You must have a connection to the database for the
get snapshot command to work.
db2mn@:/home/db2mn> db2 'get snapshot for all bufferpools'
Bufferpool Snapshot
Bufferpool name = IBMDEFAULTBP
Database name = SAMPLE
Database path = /home/db2mn/db2mn/NODE0000/SQL00001/
Input database alias =
Buffer pool data logical reads = 22
Buffer pool data physical reads = 7
Buffer pool data writes = 0
Buffer pool index logical reads = 41
Buffer pool index physical reads = 20
Total buffer pool read time (ms) = 24
Total buffer pool write time (ms) = 0
Asynchronous pool data page reads = 0
Asynchronous pool data page writes = 0
Buffer pool index writes = 0
Asynchronous pool index page reads = 0
Asynchronous pool index page writes = 0
Total elapsed asynchronous read time = 0
Total elapsed asynchronous write time = 0
Asynchronous read requests = 0
Direct reads = 0
Direct writes = 0
Direct read requests = 0
Direct write requests = 0
Direct reads elapsed time (ms) = 0
Direct write elapsed time (ms) = 0
Database files closed = 0
Data pages copied to extended storage = 0
Index pages copied to extended storage = 0
Data pages copied from extended storage = 0
Index pages copied from extended storage = 0
|
The following data elements can be measured to evaluate how the buffer pool is being used:
- Buffer pool data logical reads: the total number of read data requests that went through the buffer pool.
- Buffer pool data physical reads: the number of read requests performed that required I/O to place data pages in the buffer pools.
- Buffer pool index logical reads: the total number of read requests for index pages that went through the buffer pool.
- Buffer pool index physical reads: the number of read requests for index pages that require I/O activity to place index pages in the buffer pool.
The key to tuning the buffer pool is to increase as much as possible the buffer pool and index pool hit ratio, which is the percentage of time that the database manager DID NOT need to load a page from disk in order to service a page request.
BufferPoolHitRatio= LogicalReads – Physical Reads x 100
-----------------------------
Logical Reads
|
The formula is similar to the Index Pool Hit Ratio:
IndexPoolHitRatio= IndexLogicalReads –IndexPhysical Reads x 100
--------------------------------
IndexLogical Reads
|
For best performance we would ideally want these values to be as close to 75% as possible. 100% would mean that the entire database is loaded into memory, which is unrealistic in most cases. To increase hit ratio in the buffer pool or index pool you must play with increasing the buffer pool size (BUFFPAGE). However, when you increase this value you would likely want to also monitor the usage of system memory to ensure no paging is occurring as a result of your changes to BUFFPAGE. To monitor system memory use vmstat.
Sometimes instead of tuning BUFFPAGE, it may be that only a reorganization of data is necessary to improve the hit ratio of index pool and buffer pool. It is advisable to first perform a reorgchk command, and then check the hit ratios before tuning buffer pool.
Prefetching is the retrieval of data (one or more extent pages) from disk in anticipation of their use. This can significantly improve performance in SELECT statements by reducing the time waiting for I/O to complete. Setting the PREFETCHSIZE tells DB2 to place that number of pages in the buffer pool in anticipation of its use. The default is 32, but performance is improved when this value is set to (EXTENTSIZE * number of containers). For aggressive fetching and better performance, this value can be increased to as much as two times the value of (EXTENTSIZE * number of containers). Whether this value is set low or high, make sure it is set to a multiple of EXTENTSIZE.
When tweaking the PREFETCHSIZE the NUM_IOSERVERS must also be changed. This value should be at least:
N+ ( PREFETCHSIZE/EXTENTSIZE) |
where N represents one or more additional servers. NUM_IOSERVERS represent the number of I/O servers that perform prefetching for the database. Having additional I/O servers will not hurt performance, as unused I/O server memory pages will get paged out. For optimum parallel I/O this value should at least be set to the number of physical disks or containers in the database.
The buffer pool consists of pages that are either in use, meaning the pages are being updated or read, or dirty, meaning the pages have not yet been written to disk. Once dirty pages are written to disk they remain in the buffer pool, but their status changes to clean for reuse or for continual use by other database transactions. This is where the page cleaner (NUM_IOCLEANERS) comes into play. It is a friend to the DB2 agents, since cleaning dirty pages from buffer pool would otherwise have to be done by the individual agents. The page cleaners write changed (dirty) pages from the buffer pool to disk. As a result, application transactions are faster because DB2 agents do not have wait idle for I/O. If your database usage is only for querying, it is safe to leave NUM_IOCLEANERS set at its default of 1. But, if your application will be doing updates, inserts, and other action transactions then it is recommended that this value be set to at least the number of physical disks in the database.
CHNGPGS_THRESH is also a configuration parameter used along with NUM_IOCLEANERS. It tells NUM_IOCLEANERS when to go and look for dirty pages in the buffer pool. When the percentage of changed pages in buffer pool exceeds the threshold (CHNGPGS_THRESH), DB2 will dispatch page cleaners as necessary up to the number specified in NUM_IOCLEANERS to write dirty pages to disk. For most applications the default percentage (60) will suffice, but for databases with a small number of large tables, increasing this value to 75% will help performance.
One of the most important decisions a database administrator has to make when designing a database is how the data will be laid out across physical partitions. Overlooking this important step can drastically affect I/O performance. From an operating system layer, DB2 is only as good as the file system on which it runs. In AIX, that file system is controlled by the Logical Volume Manager (LVM). LVM is a subsystem within AIX that controls disk resources and provides a logical mapping between storage and physical disks. To achieve the most from LVM in I/O performance, it is important that you properly plan your database layout across physical disks. The following are some recommended file system layouts and accompanying tuning parameters that will help you get maximum performance from LVM and DB2.
The following settings are recommended to allow maximum resource usage of AIX by DB2.
- maxuproc
- Set the AIX maxuproc (maximum number of processes per user ) device attribute to 4096 by entering the command:
chdev –l sys0 –a maxuproc=’4096’
The default value set by AIX (500) has been found to be too low for some large scale database environments, and causes DB2 to generate an SQL error message "SQL1402N – Unable to authenticate user due to unexpected system error." For a DB2 Partitioned Server environment this value is required to be set. To check the value run:
lsattr –l sys0 –E | grep maxuproc
- ulimit and large file enabled
- To create large files on AIX greater than 2GB, "Large file enabled" must be true for the file system, and fsize for the DB2 instance owner must be set to unlimited (or -1). To verify your settings using smit, (AS ROOT) select:
- System storage management
- File systems
- Add/change/show/delete file systems
- Journaled file systems
- Change/show characteristics of a journaled file system (select File System where DB2 data resides)
- Large file enabled TRUE
ulimit –Hf unlimited ( Hard Limit) ulimit –Sf unlimited ( Soft Limit )
You may use
ulimit –Haandulimit –Sato display the hard and soft limit, respectively. Ulimit sets the /etc/security/limits for the system.Leaving the default values set by AIX can cause the following error:
SQL0968C – The file system is full.
Striping, also known as RAID 0, is a technology that was developed to achieve maximum I/O performance. The basic concept is that data is written and read from the file system in chunks across the width of the physical disk layout in parallel. The width represents the number of disks within the layout, and striping represents the contiguous layout of data across separate disks. From a high availability perspective, striping does not provide redundancy. Failure of one physical disk would render your data useless. However, to add redundancy AIX provides a layout of striping plus mirroring. The following are some general recommendations that would increase performance in a striped layout.
max_coalesce
Max_coalesce represents the maximum number of bytes the SSA disk device driver attempts to transfer to and from an SSA logical disk in one operation. Set max_coalesce equal to or a multiple of the stripe unit size, but not smaller. This value is only available for SSA configuration. To check the current value:
lsattr –El hdiskN | grep max_coalesce …. |
where hdiskN is the physical volume number.
To change the value, use:
chdev –l hdisk# -a max_coalesce=<new value> |
For example, for a RAID 5 ( 5+P ) we set max_coalesce to 0x50000 which is 5x64kb:
chdev –l hdisk# -a max_coalesce=0x50000 |
queue_depth
Queue_depth represents the maximum number of commands that the SSA disk device driver dispatched for a single disk drive for an hdisk. For an N+P array, set this value to 2*N or even 3*N. By altering this value, we maximize the chance of reading data from each component of the array in parallel. The syntax for changing this value is the same as max_coalesce:
chdev –l hdisk# -a queue_depth=<new value> |
For example, for a RAID 5 ( 5+P ) we set queue_depth to 15 which is 3 *N = 15
chdev –l hdisk# -a queue_depth=15 |
Note that these values can be also changed through smit by executing smitty chgssardsk.
minpgahead and maxpgahead
The first access to a file causes the first page to be read in. When the second page is accessed, the minpgahead number of pages is read in. Subsequent accesses of the first page of a group of read-ahead pages results in a doubling of the pages in, up to maxpgahead. Higher values of maxpgahead can be useful in systems where sequential performance is important, as in striped logical volumes. It's best to leave minpgahead at its default value of 2. Set maxpgahead to 16 * number of disk drives, using the AIX vmtune command. (Vmtune values go back to it's default values during reboot. To make this permanent, the command must be added to the AIX startup stript.)
Whenever maxpgahead is increased, make sure that maxfree is also increased so that the difference between maxfree and minfree is at least equal to the value of maxpgahead. These settings only affect I/O that goes through the AIX LVM layer; raw device I/O is not affected.
Usage:
/usr/samples/kernel/vmtune –R <new value minpgahead> |
On AIX 5.2 and higher, /usr/samples/kernel/vmtune was replaced with vmo.
lvm_bufcnt
On AIX when using applications that issue large raw I/O reads or writes that bypass the file system, as DB2 does when using Database Managed System (DMS) raw logical volumes, we recommend that you increase the value of this parameter using the AIX vmtune –u flag. The default for this is 9, which represents the size of the LVM buffer (9*128k). In most cases, if you perform large physical I/O to fast devices (over 1.125 MB ), bottlenecking could occur at the LVM buffer layer waiting on pin memory buffers (pbufs) to be freed due to not enough pbufs being available. Increasing the lvm_bufcnt should increase the LVM buffer layer and bring performance gains. At the time of this writing, the largest possible value for lvm_bufcnt is 64.
Usage:
/usr/samples/kernel/vmtune –u <new value lvm_bufcnt' |
Intensively used container
If you create a logical volume for a DB2 container that will be intensively used and updated, try NOT to put it on a disk with your striped logical volumes. For even greater performance, depending on resource availability, each container should be placed on its own disk.
Mirroring, also known as RAID 1, is a feature on AIX that can be useful within a DB2 data layout. The objective of mirroring is redundancy and high availability of data. Having two or more copies of data is important from a high availability standpoint in reducing downtime. If one copy goes bad, the system automatically starts using the other copy. AIX offers the capability to provide up to three copies of data in a mirrored environment. In AIX LVM an important concept to understand is that mirroring occurs only at the logical volume layer, and not at the physical disk layer. Users are often duped into thinking that a mirrored disk layout will give performance gains in an OLTP type of system where many reads from the database are a requirement. Normally in a mirrored configuration, during a read function if one copy is busy the system will automatically perform the read on the other unused copy, consequently enabling for parallel I/O. However, this is not always the case, as is explained below in Read performance penalty.
Write scheduling policy
When using mirrored logical volumes for your database containers, ensure that the scheduling policy is set to Parallel (the default behavior of a newly created logical volume). When performing a write operation with the Sequential policy enabled, the writes are performed sequentially, starting with the primary copy, and the write is not considered complete until the last copy is written. With the Parallel policy, all copies are treated equally and written in parallel so the write completion time is only constrained by the time it takes the slowest disk to perform the write.
To check the scheduling policy, you can use SMIT or the lslv command. The attribute to look for is SCHED POLICY, as shown below:
root@isvfin-art1 /: > lslv db2lv LOGICAL VOLUME: db2lv VOLUME GROUP: db2vg LV IDENTIFIER: 0004651700004c00000000e7bccbb849.8 PERMISSION: read/write VG STATE: active/complete LV STATE: opened/syncd TYPE: jfs WRITE VERIFY: off MAX LPs: 512 PP SIZE: 32 megabyte(s) COPIES: 1 SCHED POLICY: parallel LPs: 3 PPs: 3 STALE PPs: 0 BB POLICY: relocatable INTER-POLICY: minimum RELOCATABLE: yes INTRA-POLICY: center UPPER BOUND: 32 MOUNT POINT: /home LABEL: /home MIRROR WRITE CONSISTENCY: on/ACTIVE EACH LP COPY ON A SEPARATE PV ?: yes |
In the example above you'll also notice that MIRROR WRITE CONSISTENCY and WRITE VERIFY are both set to OFF. Setting these attributes to ON has performance implications discussed below.
WRITE VERIFY
WRITE VERIFY is a feature on AIX LVM that adds an extra level of integrity to your data. If turned ON, for every write that is performed, a read will be done to verify that the write was successful. For each write, the disk rereads the data and does an IOCC (I/O Channel Controller; SCSI command protocol) parity check on the data to see if what the platter wrote exactly matched what the write request buffer contained. This obviously impacts the write performance of that logical volume by adding more time to complete the write. If set to OFF, then writes are not verified by the AIX LVM; this is the default setting.
MIRROR WRITE CONSISTENCY
Mirror Write Consistency (MWC) ensures that data is consistent across ALL mirrored copies. In case of a system crash or abnormal shutdown, MWC will identify which copies are inconsistent by marking them as "stale." When the system comes back up, if MWC is on, AIX will read the MWC log and make all copies consistent again. Although this ensures data integrity, it hurts in write performance. Each write will consist of two operations -- one to write the data and one to update the MWC cache. However, on some hardware such as SSA adapters, the cache can be used, minimizing the performance impact.
To check if your SSA have fast cache modules installed you can run the following command: lsfg –vl <adapter name>
root@db2system /: > lscfg -vl ssa0 ssa0 P2-I4/Q1 IBM SSA 160 SerialRAID Adapter (14109100) Part Number................. 09L5695 FRU Number.................. 34L5388 Serial Number...............S0104160 EC Level.................... F23699 Manufacturer................IBM053 ROS Level and ID............7600 0000 Loadable Microcode Level....05 Device Driver Level.........00 Displayable Message.........SSA-ADAPTER Device Specific.(Z0)........SDRAM=064 Device Specific.(Z1)........CACHE=32 Device Specific.(Z2)........UID=000008005AEBAA9C Device Specific.(YL)........P2-I4/Q1 |
The SSA adapter on this system does have a Fast Cache Module installed that's 32 megabytes (Device ID: Z1).
From a DB2 and AIX point of view, data redundancy and availability are the objectives in a mirrored environment and therefore MWC cache and SSA cache are important. The preferred method is to use SSA cache and leave MWC off. If MWC is required, try to place the logical volumes on the outer edges of the disk, as this is where the MWC cache is located. This will help minimize disk head movements when performing writes.
Read performance penalty
Besides the benefit of high availability of data in a mirrored environment, mirroring speeds up read performance. To take advantage of this performance you must ensure that the scheduling policy of the LVM is set to parallel. If set to sequential, the primary disk will always be checked FIRST for every read before going to the next unbusy copy. For parallel the system bypasses this check and automatically selects the unbusy volume to read from.
In a production environment, database performance tuning is a complicated matter and should never be entered into for the sake of only gaining performance. First determine if there is a performance problem. If so, identify it and work from there. When designing a database, you should know what kind of performance you expect from it beforehand. Poor planning can affect performance by a factor of over 60% compared to good planning. Regardless of whether you are designing a database or trying to improve the performance on a production database, from a DB2 perspective to increase I/O you should add at least one buffer pool to each of your containers. Afterwards, tune the BUFFPAGE of each container until you get a good hit ratio. When you've got your hit ratio up, continue tuning PREFETCHSIZE, NUM_IOCLEANERS, CHNGPGS_THRESH, and NUM_IOSERVERS until you get the expected results. (Use both the snapshot monitor and iostat to monitor I/O performance.)
From an AIX perspective, planning is important and good disk layouts are imperative for optimum DB2 performance. This is one of the most overlooked areas in database planning. I addressed mirroring and striping, two of the most popular layouts. If performance is what you have in mind, striping is what you want. If high availability of data is important, mirroring should be your choice. There are, however, tradeoffs to be considered in each layout. For instance, although striping gives you the best performance in parallel reads and writes, failure of a single disk will render your data useless. Although mirroring will give you data redundancy, writing of data to disk is costly because it would mean up to three writes depending on how many copies you have configured. But, because of the flexibility of the LVM in AIX, you could minimize the penalty factor of each one of these configurations by tuning the following parameters:
- Striping
- max_coalesce
- queue_depth
- minpgahead
- maxpgahead
- lvm_bufcnt
- Mirroring
- Write scheduling policy
- Mirror write consistency
- Write verify
AIX Version 4.3.3 introduced the new mirroring and striping function (also known as RAID 0+1 or RAID 10 technology). This is by far the ideal configuration from a DB2 perspective. The mirroring and striping function offers the best of both configurations -- performance with the ability of parallel I/O, and redundancy with the capability of having up to three copies of the data. The one big disadvantage of this configuration is disk cost. The number of required physical disks and their individual capacity can double or even triple when using the mirroring and striping function.
This article highlighted most of the tuning parameters in both AIX and DB2 for optimal DB2 performance. It touched on file system layouts from an AIX perspective, and buffer pool management from a DB2 perspective. It also showed a few monitoring utilities in DB2 and AIX to aid in capturing performance data, and recommended some specific operating system environment settings to allow for maximum resource usage by DB2.
-
DB2 Developer Domain is the primary source for all DB2 technical resources or downloads.
-
IBM DB2 tradebooks and publications highlights some of IBM's most popular DB2 Trademark Publication sources, such as DB2 Version 8: The Official Guide, DB2 UDB v8 Handbook for Windows and Unix and Linux, and more.
-
DB2 Technical Support contains the latest documentation, downloadable in PDF format, on all DB2 versions. You can download IBM UDB V7.1 Administration Guide: Performance, SC09-2945-00 from that sight for useful information.
- From IBM Redbooks you can download:
- Database Performance Tuning on AIX, SG24-5511-01
- DB2 UDB/Websphere Performance Tuning Guide, SG24-6417-0
- DB2 UDB Performance Expert for Multiplatforms: A Usage Guide, SG24-6436-00
- Data Federation with IBM DB2 Information Integrator, SG24-7052-00
- DB2 UDB V7.1 Performance Tuning Guide, SG24-6012-00
- AIX Logical Volume Manager, from A to Z: Introduction and Concept, SG24-5432-00
- Download
DB2 Parallel Edition for AIX: Concepts and Facilities
, SG24-2514-01 in PDF format.
- Download
AIX 5L Version 5.2, Performance Tuning Guide
in PDF format.
Artis Walker is a consultant in the IBM Solutions Development group. He works on porting software vendors' applications and databases to the pSeries servers running AIX. Artis has over ten years experience working on both Windows and UNIX platforms, including AIX and Solaris. He has extensive software development experience in DB2, ODBC, JDBC, C, C++, and Java. He holds a BA from St. Mary's University, San Antonio, Texas, in Computer Science. You can contact Artis at walkerar@us.ibm.com.





