Use concurrent I/O to improve DB2 database performance
Concurrent I/O and cached I/O are features that are typically associated with file systems. Because of this, most DB2 DBAs think that the use of these two technologies lies within the purview of storage and system administrators. However, leveraging this technology in a DB2 database environment is the responsibility of the DBA. In fact, DB2 experts at IBM recommend implementing this technology at the database level since this allows the DB2 Database Manager to control which files should be opened with the O_CIO flag, and which files should not.
This article will explain what concurrent and cached I/O are, as well as highlight the important differences between the two. More importantly, it will show you how you can leverage concurrent I/O technology in a DB2 database environment to improve CPU and memory utilization.
The main topics that will be covered in this article are as follows:
- Cached I/O and its impact with respect to DB2: How cached I/O works when an application issues a request to the operating system and how that impacts a relational DBMS like DB2.
- Concurrent I/O (CIO) and how it relates to DB2: How concurrent I/O works, the difference between direct I/O and concurrent I/O, the performance similarity between CIO and raw devices, and how such performance is beneficial to DB2.
- How to implement CIO: Implementation aspects with respect to DB2, the simplicity of such an implementation, and when you should use it.
- Using CIO – restrictions and limitations: Scenarios in which CIO cannot or should not be used.
- The kinds of applications that benefit from CIO and a real-world example of its use.
What is cached I/O?
When an application issues a request to access data from disk, there are two ways in which the operating system can handle that request; one is to retrieve the data directly from disk, and the other is to utilize a cache area so that frequently accessed data can be retained for quick access. Performing physical I/Os for every request will impact performance, whereas having a buffer in memory where frequently accessed data is cached will improve the data retrieval process.
The purpose of a file buffer cache is to minimize the frequency of physical disk I/O by keeping frequently accessed data in main memory. Consequently, the use of a file buffer cache can be very effective at reducing disk I/O when cache hit ratios are high. This method of data retrieval is often referred to as cached I/O. When an application issues a read request to the operating system, the operating system first looks in the file buffer cache for the data needed. If it is found there, it is retrieved from cache and written to the application buffer—there’s no need to go to disk. However, if the requested data is not in the file system cache, the operating system must retrieve the data from disk and store it in the file system cache, before transferring it to the application buffer.
Figure 1 shows the following steps for how I/O caching works for data read operations.
- The application issues a read request.
- The operating system looks for the requested data in the file buffer.
- Requested data not present in the file buffer cache.
- OS reads data from disk.
- Read data is cached in file buffer cache.
- Read data is copied from the file buffer cache to the application cache.
Figure 1. How I/O caching works for data read operations
Another benefit of cached I/O is that the operating system can take advantage of asynchronous file writes. In other words, an application need not wait to complete a write operation before it continues execution. As with reads, cached I/O attempts to cache data writes within the file system buffer to improve future data reads and to minimize writes to disk. Thus, for read and write requests, cached I/O enables sequential read ahead and write behind behavior which helps improve performance when cache hit ratios are high.
Figure 2 shows the following steps for how I/O caching works for data write operations.
- The application issues a write request.
- The operating system copies data from the application buffer to the file buffer cache.
- The application continues execution, without waiting for disk write.
- Periodic flushing of dirty file buffer cache pages is initiated by the sync daemon.
- Dirty pages are written to disk.
Figure 2. How I/O caching works for data write operations
While the caching of data works well for most applications, it is redundant and often unnecessary for most DB2 database deployments. That is because DB2 has its own cache area (in the form of one or more buffer pools) and for most of its operations, DB2 uses this cache area to store data being accessed. So the use of file system caching, in addition to DB2 buffer pools, actually increases overhead in terms of CPU and memory. A significant portion of system memory is often consumed for the file buffer cache, and the copying of data to and from file system cache to buffer pool results in additional CPU utilization.
For write requests with cached I/O, the operating system uses what is known as an Inode locking mechanism to impose write serialization so that the integrity of data is always maintained. Consequently, when cached I/O is used with DB2, another form of overhead is introduced because DB2 relies on its own locking mechanisms to maintain data consistency and integrity.
What is concurrent I/O?
Two features introduced with JFS2 file systems on the AIX platform are direct I/O (DIO) and concurrent I/O (CIO). Linux distributions SLES 9 and higher and RHEL 4 and higher on x86, x64 and POWER architectures provide support for both DIO and CIO as well. Direct I/O eliminates the double buffering associated with cached I/O by copying data directly from disk to the appropriate application buffers. However, direct I/O imposes write serialization and Inode locking to maintain data integrity.
Concurrent I/O, on the other hand, provides direct access to data without this additional buffering and locking. This, in turn, allows a file system to offer the advantages and performance benefits often associated with raw devices. Incidentally, the performance increases obtained by eliminating buffering and locking are one of the main reasons why raw devices were frequently preferred over file systems for data storage even though it could often be difficult to manage such devices. With the introduction of concurrent I/O, it is possible for file systems to deliver the faster performance of raw devices with a much greater ease of manageability.
As was mentioned previously, an Inode lock enables write serialization at the file level to maintain data consistency in the case of cached I/O and direct I/O. Concurrent I/O bypasses this Inode lock and allows multiple threads to read and write concurrently to the same file.
Figure 3 shows the following steps for how concurrent I/O works.
- The application issues request for data.
- The operating system initiates disk read.
- Requested data is transferred from disk to application cache.
Figure 3. How concurrent I/O works
For applications that do not enforce data consistency and data integrity, the use of concurrent I/O can result in data corruption and therefore should be avoided. Most relational database systems use their own application cache, and incorporate sophisticated locking mechanisms to manage data access and consistency. Therefore using concurrent I/O technology for database applications often improves performance without posing the risk of data corruption. And in the case of DB2 databases, implementing CIO enables DB2 to avoid the additional CPU and memory overhead associated with read/write operations whenever data is moved between memory and disk.
Best practices for how to implement CIO for DB2
CIO can be implemented at the disk level by mounting file systems with the
-o cio option specified, as well as at the
application level by issuing
open() calls using
O-CIO as the Oflag parameter.
With DB2, CIO can also be implemented at the database level via the
NO FILESYSTEM CACHING option of the
CREATE TABLESPACE and
ALTER TABLESPACE SQL statements. When these
options are used, the DB2 Database Manager issues the appropriate O-CIO
calls for the table spaces identified.
Before this option was introduced with DB2 version 9.1, the only way to enable CIO was to mount the file systems using the CIO option. And when CIO is implemented at the underlying file system level in this manner, every file on that file system will be opened with CIO.
But the use of the
NO FILESYSTEM CACHING option
CREATE TABLESPACE and
ALTER TABLESPACE statements permits the DB2
Database Manager to determine which files are to be opened with CIO. This
is important because some DB2 files may benefit from file system caching
whereas, in some cases, a file system-wide implementation of CIO can be
To create a table space with CIO enabled, you need to execute a
CREATE TABLESPACE statement that looks similar
to what is shown in Listing 1.
Listing 1. Creating a table space with CIO enabled
CREATE TABLESPACE mytspace MANAGED BY DATABASE PAGESIZE 4K USING (FILE ‘….’) EXTENTSIZE 16K PREFETCH SIZE AUTOMATIC BUFFERPOOL MYBP NO FILESYSTEM CACHING;
To alter existing table spaces to use CIO, execute an
ALTER TABLESPACE command similar to what is
shown in Listing 2.
Listing 2. Altering a table space to enable CIO
ALTER TABLESPACE mytspace NO FILESYSTEM CACHING;
In the case of the
ALTER TABLESPACE statement,
the changes will not take effect until the database is deactivated and
For DB2 version 9.1, the default setting for new table spaces is
FILE SYSTEM CACHING. For DB2 version 9.5, the
default behavior is to use
for JFS on AIX, Linux System z, all non-VxFS file systems on Solaris,
HP-UX, SMS temporary table space files on all platforms, and all LOB and
large data. For all other platforms and table spaces,
NO FILE SYSTEM CACHING is the default. For DB2
version 9.7, the default for a new table space is to use the
NO FILE SYSTEM CACHING clause. Databases which
have been migrated to V9.5 or V9.7 will still have the old default
setting. So it really falls on the DBAs to make a conscious effort to
verify and implement the CIO technology in their DB2 environments.
Using CIO – restrictions and limitations
There are some situations where DB2 can benefit from the use of file system caching—one case in particular is when a table containing LOB data that is not stored inline is queried frequently. Because such LOB data is never brought into a DB2 buffer pool, performance will be better if the file system is allowed to cache LOB data read and write operations. Conversely, for inline LOBs (LOBs that are stored with other table data), applying concurrent I/O will help reduce the overhead of double buffering; since the data for inline LOBs is brought into a DB2 buffer pool, file system caching is not needed.
Temporary table spaces that use SMS storage (directory containers) also require file system caching. But for all other table spaces, concurrent I/O is recommended.
Benefits of concurrent I/O - A real world example
Because CIO eliminates double buffering and unnecessary file locking, both I/O bound and CPU bound applications will benefit from the use of CIO. And implementing CIO in a DB2 environment frees up memory and CPU resources on the DB2 server—resources that can be used for other applications.
For example, one DB2 customer faced a severe application performance issue after migrating from a large server to a smaller LPAR with a limited number of CPUs. The application became slow to the extent that no transactions were able to be processed. CPU utilization was pegged above 90% most of the time, and this impacted other applications residing on the server. Analysis of Explain data for the queries being run did not show any major difference between the source and target, and no other changes had been made to either the application code or the volume of data being accessed before and after the migration.
At first, the customer made the decision to add more CPUs to the LPAR since
that appeared to be the only difference between the source and target.
But, while investigating the actual source of the problem, it was
discovered that none of the data table spaces for the database had CIO
enabled. In other words, every table space had been created with
FILESYSTEM CACHING enabled. So CIO was
implemented for all the data table spaces using the
ALTER TABLESPACE command presented previously.
The results were immediately noticeable. The performance of the application improved greatly and the transactions processed at a faster rate than the source server. CPU utilization on the server dropped to 30% and other applications running on the server were able to resume their normal function. The following table summarizes the before and after scenario.
Table 1. Customer's DB2 environment before and after CIO was enabled
|System configuration||CPU utilization (before CIO)||CPU utilization (after CIO)|
|CPU was pegged at 90% and above;|
Application was very slow.
|CPU usage dropped to 20-30%;|
Application performance improved by 90%
Verifying whether CIO is being used by DB2 is relatively simple. Output
db2look utility will show whether or
not a particular table space was created with the
NO FILESYSTEM CACHING option specified. A
snapshot for table spaces on the database will also provide this
information. Another method that can be used is to view the
SNAPTBSP table under the
SYSIBMADM administrative view, which contains
most of the information obtained by a table space snapshot.
The concurrent I/O feature, and especially its relation to DB2, is not very widely known within the DB2 community. The goal of this article was to bring more awareness to this feature, and to benefit more DBAs as they attempt to implement this technology in their database environments. As you can see, implementing this feature in a DB2 environment is easy and can yield noticeable performance benefits. For DB2 version 9.7, the use of CIO is the default behavior, but any migrations from prior versions will have to be manually upgraded to take advantage of this feature. Therefore, it is the responsibility of the DBA to implement CIO at the table space level once a database migration has taken place.
My sincere thanks to Mr. Roger E. Sanders for his invaluable guidance and support. I also wish to thank Mr. Bob Meier, Ms. Cindy DeLonjay and Mr. Kirk Blumenstock at NASCO for their support and encouragement.
- Get more information from the DB2 V9.5 Information Center .
- Learn more from the DB2 V9.7 Information Center .
- In the DB2 for Linux, UNIX, and Windows area on developerWorks, get the resources you need to advance your DB2 skills.
- Download a free trial version of DB2 for Linux, UNIX, and Windows .
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.