Use concurrent I/O to improve DB2 database performance

Concurrent I/O is a feature that was introduced in UNIX® and Linux® platforms mainly for the benefit of relational database systems. This article explains the difference between the I/O mechanisms available with UNIX and Linux file systems, and shows how concurrent I/O technology can be leveraged in an IBM® DB2® environment to improve database performance.

Jeyabarathi Chakrapani (jbchakra@gmail.com), Systems Engineer

JB Chakrapani photoJeyabarathi (JB) Chakrapani works as a Systems Engineer for NASCO, which is celebrating 25 years of service in providing information technology solutions to Blue Cross and Blue Shield Plans across the nation. She has more than 14 years of experience with DB2 and has worked as a DBA for several years. She was one of the top four finalists of the Spring 2011 DB2's Got Talent contest sponsored by IDUG. She received the Service Excellence award from IBM in 2011 for her outstanding customer support. She holds a Masters degree in Business Administration (Systems Management) from Regional Engineering College, India. She also holds a Bachelor of Science degree in Electrical and Electronics Engineering from GCT, India. She holds the following certifications: IBM Certified DB2 DBA for LUW V9, and IBM Certified DB2 UDB Advanced DBAdmin for LUW V8.1.



12 April 2012

Also available in Chinese

Overview

Concurrent I/O and cached I/O are features that are typically associated with file systems. Because of this, most DB2 DBAs think that the use of these two technologies lies within the purview of storage and system administrators. However, leveraging this technology in a DB2 database environment is the responsibility of the DBA. In fact, DB2 experts at IBM recommend implementing this technology at the database level since this allows the DB2 Database Manager to control which files should be opened with the O_CIO flag, and which files should not.

This article will explain what concurrent and cached I/O are, as well as highlight the important differences between the two. More importantly, it will show you how you can leverage concurrent I/O technology in a DB2 database environment to improve CPU and memory utilization.

The main topics that will be covered in this article are as follows:

  • Cached I/O and its impact with respect to DB2: How cached I/O works when an application issues a request to the operating system and how that impacts a relational DBMS like DB2.
  • Concurrent I/O (CIO) and how it relates to DB2: How concurrent I/O works, the difference between direct I/O and concurrent I/O, the performance similarity between CIO and raw devices, and how such performance is beneficial to DB2.
  • How to implement CIO: Implementation aspects with respect to DB2, the simplicity of such an implementation, and when you should use it.
  • Using CIO – restrictions and limitations: Scenarios in which CIO cannot or should not be used.
  • The kinds of applications that benefit from CIO and a real-world example of its use.

What is cached I/O?

When an application issues a request to access data from disk, there are two ways in which the operating system can handle that request; one is to retrieve the data directly from disk, and the other is to utilize a cache area so that frequently accessed data can be retained for quick access. Performing physical I/Os for every request will impact performance, whereas having a buffer in memory where frequently accessed data is cached will improve the data retrieval process.

The purpose of a file buffer cache is to minimize the frequency of physical disk I/O by keeping frequently accessed data in main memory. Consequently, the use of a file buffer cache can be very effective at reducing disk I/O when cache hit ratios are high. This method of data retrieval is often referred to as cached I/O. When an application issues a read request to the operating system, the operating system first looks in the file buffer cache for the data needed. If it is found there, it is retrieved from cache and written to the application buffer—there’s no need to go to disk. However, if the requested data is not in the file system cache, the operating system must retrieve the data from disk and store it in the file system cache, before transferring it to the application buffer.

Figure 1 shows the following steps for how I/O caching works for data read operations.

  1. The application issues a read request.
  2. The operating system looks for the requested data in the file buffer.
  3. Requested data not present in the file buffer cache.
  4. OS reads data from disk.
  5. Read data is cached in file buffer cache.
  6. Read data is copied from the file buffer cache to the application cache.
Figure 1. How I/O caching works for data read operations
Figure 1 shows the steps for how I/O caching works for data read operations

Another benefit of cached I/O is that the operating system can take advantage of asynchronous file writes. In other words, an application need not wait to complete a write operation before it continues execution. As with reads, cached I/O attempts to cache data writes within the file system buffer to improve future data reads and to minimize writes to disk. Thus, for read and write requests, cached I/O enables sequential read ahead and write behind behavior which helps improve performance when cache hit ratios are high.

Figure 2 shows the following steps for how I/O caching works for data write operations.

  1. The application issues a write request.
  2. The operating system copies data from the application buffer to the file buffer cache.
  3. The application continues execution, without waiting for disk write.
  4. Periodic flushing of dirty file buffer cache pages is initiated by the sync daemon.
  5. Dirty pages are written to disk.
Figure 2. How I/O caching works for data write operations
Figure 2 shows the steps for how I/O casching works for data write operations

While the caching of data works well for most applications, it is redundant and often unnecessary for most DB2 database deployments. That is because DB2 has its own cache area (in the form of one or more buffer pools) and for most of its operations, DB2 uses this cache area to store data being accessed. So the use of file system caching, in addition to DB2 buffer pools, actually increases overhead in terms of CPU and memory. A significant portion of system memory is often consumed for the file buffer cache, and the copying of data to and from file system cache to buffer pool results in additional CPU utilization.

For write requests with cached I/O, the operating system uses what is known as an Inode locking mechanism to impose write serialization so that the integrity of data is always maintained. Consequently, when cached I/O is used with DB2, another form of overhead is introduced because DB2 relies on its own locking mechanisms to maintain data consistency and integrity.


What is concurrent I/O?

Two features introduced with JFS2 file systems on the AIX platform are direct I/O (DIO) and concurrent I/O (CIO). Linux distributions SLES 9 and higher and RHEL 4 and higher on x86, x64 and POWER architectures provide support for both DIO and CIO as well. Direct I/O eliminates the double buffering associated with cached I/O by copying data directly from disk to the appropriate application buffers. However, direct I/O imposes write serialization and Inode locking to maintain data integrity.

Concurrent I/O, on the other hand, provides direct access to data without this additional buffering and locking. This, in turn, allows a file system to offer the advantages and performance benefits often associated with raw devices. Incidentally, the performance increases obtained by eliminating buffering and locking are one of the main reasons why raw devices were frequently preferred over file systems for data storage even though it could often be difficult to manage such devices. With the introduction of concurrent I/O, it is possible for file systems to deliver the faster performance of raw devices with a much greater ease of manageability.

As was mentioned previously, an Inode lock enables write serialization at the file level to maintain data consistency in the case of cached I/O and direct I/O. Concurrent I/O bypasses this Inode lock and allows multiple threads to read and write concurrently to the same file.

Figure 3 shows the following steps for how concurrent I/O works.

  1. The application issues request for data.
  2. The operating system initiates disk read.
  3. Requested data is transferred from disk to application cache.
Figure 3. How concurrent I/O works
Figure 3 shows the steps for how concurrent I/O works

For applications that do not enforce data consistency and data integrity, the use of concurrent I/O can result in data corruption and therefore should be avoided. Most relational database systems use their own application cache, and incorporate sophisticated locking mechanisms to manage data access and consistency. Therefore using concurrent I/O technology for database applications often improves performance without posing the risk of data corruption. And in the case of DB2 databases, implementing CIO enables DB2 to avoid the additional CPU and memory overhead associated with read/write operations whenever data is moved between memory and disk.


Best practices for how to implement CIO for DB2

CIO can be implemented at the disk level by mounting file systems with the -o cio option specified, as well as at the application level by issuing open() calls using O-CIO as the Oflag parameter.

With DB2, CIO can also be implemented at the database level via the NO FILESYSTEM CACHING option of the CREATE TABLESPACE and ALTER TABLESPACE SQL statements. When these options are used, the DB2 Database Manager issues the appropriate O-CIO calls for the table spaces identified.

Before this option was introduced with DB2 version 9.1, the only way to enable CIO was to mount the file systems using the CIO option. And when CIO is implemented at the underlying file system level in this manner, every file on that file system will be opened with CIO.

But the use of the NO FILESYSTEM CACHING option of the CREATE TABLESPACE and ALTER TABLESPACE statements permits the DB2 Database Manager to determine which files are to be opened with CIO. This is important because some DB2 files may benefit from file system caching whereas, in some cases, a file system-wide implementation of CIO can be disadvantageous.

To create a table space with CIO enabled, you need to execute a CREATE TABLESPACE statement that looks similar to what is shown in Listing 1.

Listing 1. Creating a table space with CIO enabled
CREATE TABLESPACE mytspace 
MANAGED BY DATABASE PAGESIZE 4K
USING (FILE ‘….’)
    EXTENTSIZE 16K
    PREFETCH SIZE AUTOMATIC
    BUFFERPOOL MYBP
    NO FILESYSTEM CACHING;

To alter existing table spaces to use CIO, execute an ALTER TABLESPACE command similar to what is shown in Listing 2.

Listing 2. Altering a table space to enable CIO
ALTER TABLESPACE mytspace 
    NO FILESYSTEM CACHING;

In the case of the ALTER TABLESPACE statement, the changes will not take effect until the database is deactivated and then reactivated.

For DB2 version 9.1, the default setting for new table spaces is FILE SYSTEM CACHING. For DB2 version 9.5, the default behavior is to use FILESYSTEM CACHING for JFS on AIX, Linux System z, all non-VxFS file systems on Solaris, HP-UX, SMS temporary table space files on all platforms, and all LOB and large data. For all other platforms and table spaces, NO FILE SYSTEM CACHING is the default. For DB2 version 9.7, the default for a new table space is to use the NO FILE SYSTEM CACHING clause. Databases which have been migrated to V9.5 or V9.7 will still have the old default setting. So it really falls on the DBAs to make a conscious effort to verify and implement the CIO technology in their DB2 environments.


Using CIO – restrictions and limitations

There are some situations where DB2 can benefit from the use of file system caching—one case in particular is when a table containing LOB data that is not stored inline is queried frequently. Because such LOB data is never brought into a DB2 buffer pool, performance will be better if the file system is allowed to cache LOB data read and write operations. Conversely, for inline LOBs (LOBs that are stored with other table data), applying concurrent I/O will help reduce the overhead of double buffering; since the data for inline LOBs is brought into a DB2 buffer pool, file system caching is not needed.

Temporary table spaces that use SMS storage (directory containers) also require file system caching. But for all other table spaces, concurrent I/O is recommended.


Benefits of concurrent I/O - A real world example

Because CIO eliminates double buffering and unnecessary file locking, both I/O bound and CPU bound applications will benefit from the use of CIO. And implementing CIO in a DB2 environment frees up memory and CPU resources on the DB2 server—resources that can be used for other applications.

For example, one DB2 customer faced a severe application performance issue after migrating from a large server to a smaller LPAR with a limited number of CPUs. The application became slow to the extent that no transactions were able to be processed. CPU utilization was pegged above 90% most of the time, and this impacted other applications residing on the server. Analysis of Explain data for the queries being run did not show any major difference between the source and target, and no other changes had been made to either the application code or the volume of data being accessed before and after the migration.

At first, the customer made the decision to add more CPUs to the LPAR since that appeared to be the only difference between the source and target. But, while investigating the actual source of the problem, it was discovered that none of the data table spaces for the database had CIO enabled. In other words, every table space had been created with FILESYSTEM CACHING enabled. So CIO was implemented for all the data table spaces using the ALTER TABLESPACE command presented previously.

The results were immediately noticeable. The performance of the application improved greatly and the transactions processed at a faster rate than the source server. CPU utilization on the server dropped to 30% and other applications running on the server were able to resume their normal function. The following table summarizes the before and after scenario.

Table 1. Customer's DB2 environment before and after CIO was enabled
System configurationCPU utilization (before CIO)CPU utilization (after CIO)
  • AIX 5.3
  • 27 GB RAM
  • 2 CPUS
  • DB2 V9.1
  • Database size: 90G
CPU was pegged at 90% and above;

Application was very slow.

CPU usage dropped to 20-30%;

Application performance improved by 90%

Verifying whether CIO is being used by DB2 is relatively simple. Output from the db2look utility will show whether or not a particular table space was created with the NO FILESYSTEM CACHING option specified. A snapshot for table spaces on the database will also provide this information. Another method that can be used is to view the SNAPTBSP table under the SYSIBMADM administrative view, which contains most of the information obtained by a table space snapshot.


Conclusion

The concurrent I/O feature, and especially its relation to DB2, is not very widely known within the DB2 community. The goal of this article was to bring more awareness to this feature, and to benefit more DBAs as they attempt to implement this technology in their database environments. As you can see, implementing this feature in a DB2 environment is easy and can yield noticeable performance benefits. For DB2 version 9.7, the use of CIO is the default behavior, but any migrations from prior versions will have to be manually upgraded to take advantage of this feature. Therefore, it is the responsibility of the DBA to implement CIO at the table space level once a database migration has taken place.


Acknowledgements

My sincere thanks to Mr. Roger E. Sanders for his invaluable guidance and support. I also wish to thank Mr. Bob Meier, Ms. Cindy DeLonjay and Mr. Kirk Blumenstock at NASCO for their support and encouragement.

Resources

Learn

Get products and technologies

  • Download a free trial version of DB2 for Linux, UNIX, and Windows .
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=809197
ArticleTitle=Use concurrent I/O to improve DB2 database performance
publish-date=04122012