IBM DB2 Enterprise 9 performance with POWER5+ and AIX 5L multipage support

Exploit 64KB page support

Learn how IBM® DB2® 9 for Linux®, UNIX®, and Windows® (DB2) exploits multiple page sizes. With the introduction of the POWER5+™ processor architecture, the IBM AIX 5L™ operating system added support for a new 64-kilobyte page with properties that are similar to the current default 4-kilobyte pages. In addition, AIX 5L Version 5.3 TL04 also introduced a new 16-gigabyte huge-page feature for this hardware architecture. DB2 9 automatically exploits the 64-kilobyte pages to deliver high performance for database applications on this platform. In addition, DB2 also supports the enablement of 16-gigabyte huge pages.

Sunil Kamath (sunil.kamath@ca.ibm.com), Manager, OLTP Performance, IBM

Author picture: Sunil KamathSunil Kamath is the technical manager of the DB2 online transaction processing (OLTP) benchmarks and solution development. He has been working on DB2 performance for more than six years and has led many successful world-record TPC-C and SAP benchmarks. His interest and responsibilities also include exploring and exploiting key hardware, operating system and compiler technologies that help improve DB2 performance. You can reach him at sunil.kamath@ca.ibm.com.



Punit Shah (punit@us.ibm.com), Software Engineer, IBM

Author photo: Punit ShahPunit Shah's primary responsibilities are enabling IBM middleware software to use the latest System p technologies. He has been working in enterprise application development and performance, and has authored or coauthored several articles in this area. You can reach him at punit@us.ibm.com.



08 June 2006

Introduction

For a variety of reasons, most modern operating systems run programs in virtual address space. The virtual address offers many advantages, including flexibility, isolation, transportability, the ability to access more memory than the amount of physical memory, and to some extent, underlying hardware configuration independence.

However, running programs in a virtual address space has an associated cost. Programs (including DB2) reference memory addresses as virtual addresses. Every time a memory location is addressed for program instructions or data, a virtual address is translated into a physical or real memory address. This translation is maintained in a page table and adds more overhead to the program’s execution time. The size of the page table is inversely proportional to the page size, which implies that the smaller the page size, the larger the page table, and hence more overhead.

For years, 4 kilobytes has been the standard page size for most operating systems, including the AIX 5L operating environment. In recent days, increasing volumes of data and processor-addressable memory has rendered the 4-kilobyte page size somewhat ineffective. To improve performance of the applications that process extensive data volumes, IBM POWER5+ processor-based systems that run AIX 5L V5.3 TL04 (or later) introduce multiple page size support. The POWER™ processor and the AIX 5L operating system have supported two page sizes (4 kilobytes and 16 megabytes) since the introduction of AIX 5L V5.1. In addition to these two page sizes, newly available page sizes include 64 kilobytes and 16 gigabytes. The 64-kilobyte pages behave exactly the same as the 4-kilobyte page size. (That is, the pages are not pinned into memory and are capable of being paged.)

To take advantage of the newly available page sizes, DB2 9 automatically detects available page sizes within the system. If available, DB2 sets 64 kilobytes as a default size for a few of the processes as well as for all shared memory regions. DB2 also added support for 16-gigabyte pages since IBM DB2 Universal Database ™ (UDB) V8.2.5.


Background

Let's take a look at the process runtime mechanism and why large page sizes are so valuable to enterprise applications such as DB2.

Process runtime environment

Before any program can run, the operating system loader must load it into real memory. The memory necessary to run processes in the AIX 5L environment is divided into various memory regions. The private regions for a process are: text, stack, and data/heap, each of which is dedicated for specific purposes:

  • The text region stores the process’s instructions.
  • The data/heap region contains dynamically allocated memory and some globally accessible program data (for example, DB2 agent private memory).
  • The stack region is used for subroutine return addresses, as well as to store automatic data.

A nonprivate memory region, called shared memory, is the interprocess communication mechanism that is widely used by DB2 and other multiprocessing applications. DB2 uses shared memory segments to process and share data (primarily for buffer pools) efficiently among collaborating DB2 processes, such as DB2 agents. DB2 also uses shared memory for various other heaps.

As mentioned before, memory addresses that are referred by a process are virtual addresses and require translation to the physical address. For each running process, the mapping between virtual and physical address is maintained in a data structure called the page table. The number of page table entries is proportional to the size of virtual address space. As a result, the size of the page table is of significance. To speed up address translation, there is a processor-on-a-chip (PoC) cache and associated logic called translation lookaside buffer (TLB). A TLB is a relatively small cache area that stores the most-recent address translation in anticipation of its potential reuse.

Benefits of a large page

In addition to processor clock speed, another important processor performance metric is clock cycles per instruction (CPI). Essentially, CPI is a measure of how much time an instruction takes to run. Typically, CPI is expressed in terms of average or normalized CPI. A lower CPI number results in faster execution and better performance.

TLB cache entry reuse (cache hit) equates to quicker address translation and subsequently faster access to physical memory. A TLB miss requires accessing a page table that is stored in the main memory, which consumes considerably more processor cycles. Increasing the size of the process’s address space (that is, from a 32- to 64-bit address space), which is becoming more common, increases the page table size and degrades the address translation performance.

There are two options to counter this. One option is to increase the TLB size. However, because of on-chip space limitations, the TLB size can not be increased proportionally. Another option is to decrease page table size by reducing page table entries. As pointed out before, page table size is inversely proportional to the page size, which means that any increase in page size decreases page table size, and additionally, allows a single TLB entry to fulfill many more address translations. (That is, a bigger page stores relatively more information per page.)

The POWER5+ processor architecture (running the AIX 5L operating system) addresses the page table problem by introducing multiple page sizes. An application can choose the page size that is adequate for the nature and size of the workload. As you will see later in this article, this is an important consideration that can yield significant performance benefits.


AIX 5L multipage support

This section gives a brief overview of the new AIX 5L support for multiple page sizes. As mentioned in the introduction, the POWER5+ processor and AIX 5L V5.3 (with the 5300-04 recommended technology level) introduce support for two new virtual memory page sizes: 64 kilobytes and 16 gigabytes. Although 16-gigabyte pages are intended only for use in very high performance environments, 64-kilobyte pages are designed for general purposes. In fact, most workloads will likely see a benefit by using 64-kilobyte pages rather than 4-kilobyte pages. We discuss the performance benefits of using 64-kilobyte pages with DB2 later in this paper. Allocating 16-gigabyte pages requires the IBM Hardware Management Console (HMC) Version 5 Release 2 machine code.

When running the 64-bit AIX 5L kernel at the 5300-04 technology level on POWER5+ processor-based systems, support for 64-kilobyte page sizes is automatic, requiring no system configuration or tuning.

Note that 64-kilobyte pages are fully capable of being paged and that the size of the pool of 64-kilobyte page frames is dynamic. The AIX 5L operating system fully manages the pool size and varies the number of 4-kilobyte and 64-kilobyte page frames to meet demand for the different page sizes. However, 16-megabyte and 16-gigabyte page sizes require AIX 5L configuration using the vmo command. See the Resources section of this article for more information about AIX support for multiple page sizes.

You can use the AIX 5L svmon and vmstat commands to monitor the number of 4-kilobyte and 64-kilobyte page frames on a system. For example, to display DB2 process statistics about each page size, you can use the -P flag with the DB2 process ID (PID) with the svmon command:

# svmon –P 852128
-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
  852128 db2sysc         372534    65669        0   371671      Y     N     N

    PageSize      Inuse        Pin       Pgsp    Virtual
     s   4 KB       4521          0          0       3657
     m  64 KB     302477        133          0     302478

   Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp  Virtual
       0         0 work kernel (lgpg_vsid=0)         L  65536 65536    0 65536 
                   Addr Range: 0..65535
  2e845f  78000048 work default shmat/mmap           m   4096     0    0  4096 
                   Addr Range: 0..4095
  1987b1  78000021 work default shmat/mmap           m   4096     0    0  4096

... output snipped ...

DB2 multipage enablement

Starting with DB2 9, DB2 automatically detects and uses 64-kilobyte page sizes for select memory regions. Prior to this release, DB2 used the 4-kilobyte page size by default and also provided the ability to custom-configure 16-megabyte pages on systems that supported it. The POWER5+ architecture and DB2 UDB V8.2.5 also introduced custom configuration of 16-gigabyte huge pages.

For DB2, the largest consumer of system main memory is its buffer pool, which is basically a single, large shared memory region. DB2 supports all the page sizes (4-kilobyte, 64-kilobyte, 16-megabyte, and 16-gigabyte) that are available with the underlying system for its buffer-pool-shared memory region. DB2 9 automatically detects if the AIX 5L operating system and POWER5 hardware support 64-kilobyte pages, and if available, enables the use of the 64-kilobyte page size for all DB2 shared memory regions, including buffer pools, lock list, log buffer, utility heap, package cache, catalog cache, monitor heap, and shared sort heap. The Performance section of this article will present details about the performance improvements using different supported AIX page sizes on DB2 transactional workloads.

In addition to shared memory, DB2 also allows custom configuration of agent private memory into 16-megabyte pages. The typical consumer of agent private memory is the sort heap memory that is used by the agent to sort rows during query execution. However, DB2 currently does not automatically enable 64 kilobytes for agent private memory, nor does it allow custom configuration of agent private memory configuration with 16-gigabyte pages.

Even though the AIX 5L operating system and DB2 support these multiple page sizes, the configuration of 16-megabyte or 16-gigabyte pages requires customized hardware, AIX 5L, and DB2 configurations. To instruct DB2 to select preconfigured 16-megabyte or 16-gigabyte pages, you can use the DB2_LARGE_PAGE_MEM=DB or DB2_LARGE_PAGE_MEM=DB:16GB registry variable. Refer to the current DB2 and AIX 5L documentation for the instructions to configure the hardware and AIX 5L operating system to support 16-megabyte or 16-gigabyte pages.

Both 16-megabyte and 16-gigabyte page sizes require careful sizing of the memory to be allocated to these different page-size AIX memory pools. Do this with extreme caution, because over configuring the system with 16-megabyte or 16-gigabyte pages can lead to excessive paging of 4-kilobyte or 64-kilobyte page size pools. When the 16-megabyte and 16-gigabyte pages are allocated, they are pinned in memory, and the AIX 5L operating system cannot dynamically move pages between the larger (16-megabyte or 16-gigabyte) page size pools and the smaller (4-kilobyte or 64-kilobyte) page size pools. However, if an adequate amount of memory has been allocated to the 16-gigabyte or 16-megabyte pool, the workload can benefit in terms of performance.

With the introduction and support of the 64-kilobyte pages, DB2 applications running on POWER5+ processor-based hardware with the AIX 5L 5300-04 technology level, can automatically take advantage of the performance benefits that the bigger page size offers, without any user or administration overhead. It is not necessary to pin the 64-kilobyte pages in memory and the AIX 5L operating system dynamically moves pages between 4-kilobyte and 64-kilobyte page-size pools.

In addition to all DB2 shared memory regions, DB2 9 automatically uses 64-kilobyte pages for its process stack, thereby giving further performance improvements without any resource overhead.


Performance

The previous sections of this article discussed the key performance advantages of larger page sizes and also its exploitation in DB2. In this section, you will learn the performance results (as measured with DB2 9) when using an internal online transaction processing (OLTP) workload.

The performance results we describe were collected on two sets of systems:

  • An IBM System p5™ 570 Model 9117-570 using 16 POWER5+ processors, running at 2.2 gigahertz with 512-gigabyte RAM
    • An 8-terabyte database was created.
    • 473 gigabytes (out of the total 512-gigabyte RAM) were allocated for the DB2 buffer pools.
  • An IBM System p5 520 Express Model 9131-52A using two POWER5+ processors, running at 1.9 gigahertz with 32-gigabyte RAM
    • A 25-gigabyte database was created.
    • 18 gigabytes (out of the total 32-gigabyte RAM) of memory were allocated for the DB2 buffer pools.

Chart 1 shows the relative performance improvements, which was normalized between the 4-kilobyte page size and the different page sizes that DB2 and the AIX 5L operating system support. These measurements were done on a 16-way p5-570. There was a performance improvement of 13% with the 64-kilobyte page size in comparison to the 4-kilobyte pages. In addition, when going from 64-kilobyte to 16-megabyte pages, there was an incremental performance improvement of 8%. Finally, when moving from 16-megabyte pages to 16-gigabyte pages, there was an additional 3% performance gain. The DB2 throughput performance was measured using a transactions-per-minute metric.

Chart 1. Performance improvement between 4-kilobyte pages and other supported page sizes
Performance improvement between 4-kilobyte pages and other supported page sizes

Chart 2 correlates the performance improvements seen in Chart 1 with the decreased CPI and the corresponding decrease in TLB miss rates. A higher CPI value indicates that more cycles were necessary to run the instruction, implying suboptimal execution of the program. The TLB miss overhead represents one important component of the overall CPI metric. An increase in TLB hit rate improves the CPI metric and, therefore, performance of that program.

As shown in Chart 2, with 64-kilobyte pages, the DB2 workload’s CPI metric improved by 11%. The TLB miss rate decreased by 13% compared to the same measurement for 4-kilobyte pages, thereby improving the overall performance of the workload by 13%. With 16-megabyte and 16-gigabyte pages, a further improvement in TLB hit rate led to proportional CPI gains and, therefore, higher overall throughput.

Chart 2. CPI and TLB performance comparison between 4-kilobyte pages and other supported page sizes
Performance improvement between 4-kilobyte pages and other supported page sizes

With the help of the Charts 1 and 2, you can easily infer that the use of 64-kilobyte page sizes provides significant performance improvement for DB2 applications. However, it is also important to understand the range of performance gains that you can expect to see with your workloads, as this varies widely, depending on the size of the database, the available system RAM, and the memory allocated for DB2 buffer pools. To demonstrate a range of percentage gains you can expect with 64-kilobyte pages, another set of tests for the same workload was performed on a p5-520 with 32-gigabyte RAM. For this test, only 18 gigabytes of the available 32-gigabyte RAM was assigned to the DB2 buffer pools.

Chart 3 shows the percentage of throughput gain that was observed on this p5-520 in comparison to the results on the p5-570 (which used 473 gigabytes for the DB2 buffer pools). Performance improved of 5% with the 64-kilobyte pages as compared to the 4-kilobyte pages.

Chart 3. Performance improvements with 64-kilobyte pages between a system with 32-gigabyte RAM and a system with 512-gigabyte RAM
Performance improvements with 64-kilobyte pages between a system with 32-gigabyte RAM and a system with 512-gigabyte RAM

This difference in performance gain is predictable because the pressure on the TLB cache increases as the size of the database and the memory assigned to the DB2 buffer pools increases.


Summary

DB2 9 automatically detects and uses 64-kilobyte page sizes when running on the latest AIX 5L V5.3 TL04 operating system on a system that supports this page size. This larger page size provides significant out-of-the-box performance improvements. The performance gain can vary widely (in the range of 5 to 13%, depending on the workload). On systems with large amounts of memory, you can now get better value for your hardware investment by limiting the overhead associated with converting virtual addresses to physical or real memory addresses by effectively utilizing the TLB cache. In addition, DB2 also allows custom configuration of the 16-megabyte large pages, as well as the 16-gigabyte huge pages, which you can also exploit if additional performance is needed.


Special notices

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Information is provided "AS IS" without warranty of any kind.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, AIX and UNIX
ArticleID=127298
ArticleTitle=IBM DB2 Enterprise 9 performance with POWER5+ and AIX 5L multipage support
publish-date=06082006