Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

developerWorks Community:

  • Close [x]

Leverage transparent huge pages on Linux on POWER

An easy way to improve performance for some applications

Bill Buros (wburos@us.ibm.com), LTC Linux Performance, IBM, Software Group
Photo of Bill Buros
Bill Buros works is a performance analyst in the IBM Linux Technology Center. Based in Austin, Texas, Bill has been focused on performance for compute-intensive applications running on the Linux operating system. He works with customers and software product vendors to improve performance of their products, the Linux operating system, the kernel, and the overall software stack. One of his recent goals has been to get transparent huge pages implemented in the operating system for customers of Linux. You can contact him at wburos@us.ibm.com.

Summary:  Learn more about the libhugetlbfs libraries and how to use them with the GNU Compiler Collection (GCC) or the IBM XL C/C++ and XL Fortran compilers for Linux®. libhugetlbfs is an open source community project that provides transparent access for customer applications to system huge pages. SUSE Linux Enterprise Server 10 (SLES 10) and Red Hat Enterprise Server Linux 5 (RHEL 5) now support libhugetlbfs. While the libhugetlbfs support is available for a number of hardware platforms that support Linux huge pages, this article focuses on the 16MB huge page support available on IBM POWER processor-based systems.

Date:  20 Apr 2007
Level:  Intermediate

Activity:  15137 views
Comments:  

Introduction and background

Transparently leveraging huge pages on Linux® -- which allow memory page table entries to cover larger (up to multiple megabytes) ranges of contiguous physical memory -- has become much easier with the recent introduction of Version 1 of the libhugetlbfs library on SourceForge (see Resources). The libhugetlbfs library has been updated for SUSE Linux Enterprise Server 10 (SLES 10) and is available for Red Hat Enterprise Server Linux 5 (RHEL 5). Customers use it for benchmarking activities to improve select applications on POWER, Intel®, and AMD systems with Linux. With a focus on IBM POWER processor-based systems with 16MB page sizes, this article provides an introduction to libhugetlbfs, including the following:

  • Considerations of when and when not to use libhugetlbfs
  • Instructions on how to install and set up libhugetlbfs
  • Information on providing system access control to the huge pages
  • A simple code example for backing malloc and application bss sections with huge pages
  • Examples of industry-standard external publishes using libhugetlbfs

Supported systems

For Linux on POWER systems, the libhugetlbfs library is supported today on SLES 10 and RHEL 5 systems where 16MB pages are available. This includes POWER4, POWER5, POWER5+ systems, and BladeCenter® JS20 and BladeCenter JS21.

While the focus of this article is using POWER systems, libhugetlbfs support is available on Linux for Intel and AMD-based systems that support huge pages. For those systems, the hardware page sizes are different but the approach is the same, and testing that has shown good performance improvements can be seen on those Linux systems as well.

No source code changes to your application

In this article, transparently leveraging huge pages means that applications can take advantage of the performance advantages of the larger hardware page sizes with no source code changes. Linux already supports exploiting system huge pages, but applications have to be specifically coded to take advantage of the feature. Examples of this support exists in general system software products like the latest Java™ engines and the various large database vendor products.

If you use libhugetlbfs on 32-bit or 64-bit applications with the platforms listed above, your application requires no source code changes to take advantage of 16MB huge pages. You have several options available to you, which I describe below:

  • Backing .bss, .data, and .text sections with huge pages: You can specify that you want all three sections (the .bss section, the .data section, and the .text section) loaded into huge pages. The .bss section is the uninitialized global data structures in a program (for example, a large Fortran array). The .bss section can be very large, with many compute intensive workloads declaring and using Fortran arrays that can use many gigabytes of memory. The .data section is the initialized global variables and data structures (rarely that big of a piece) of an application program, and the .text section as the executable text (the binary code) itself. To specify that these application sections be backed by system huge pages, all you need to do is relink your executable.

Looking at your executable

To see sections defined in your POWER executable, try the following command:
readelf --sections -W yourApp.
This shows you the sections defined in the executable. Look for the .bss, .data, and .text sections in the output. For more information, see the Sections in an executable section later in this article.

  • Backing just the .bss section: A common use of libhugetlbfs for Fortran programs is to specify that just the .bss section be backed by system huge pages. The .data section and the .text sections continue to be backed by normal hardware base page sizes. As with the first approach, using libhugetlbfs requires a simple relink of the executable.
  • Backing malloc with huge pages: You can specify that you want run time malloc calls to use memory backed by 16MB pages. This includes the malloc derivatives (calloc, valloc, realloc, and so forth) as the underlying malloc invocation is the system call being overridden. The libhugetlbfs library automatically manages all of the malloc requests in the 16MB pages. You can use this approach with any executable and it does not need a specially linked executable.

Key considerations

Before you actually use libhugetlbfs, there are some key considerations you need to keep in mind, as follows:

  • For the purposes of developing an application that is supported in a customer environment, you need a shipping and supported distribution of Linux. For SUSE, this is SLES 10. For Red Hat, this is Red Hat Enterprise Linux 5 (RHEL 5) that includes the libhugetlbfs libraries. SLES 9 and RHEL 4 are not covered in this article, since they do not come with the correct kernel support for libhugetlbfs. Experienced Linux users can always get a recent mainline kernel that provides the libhugetlbfs kernel functionality required, but this would not be a customer supported configuration.
  • Huge pages for Linux on POWER are the 16MB huge pages that are defined prior to usage (either on boot or by a root user) and are pinned by the operating system in main memory when they are used. 16MB huge pages on POWER systems are contiguous pieces of 16MB physical memory.
  • On POWER systems, the Linux community refers to these pages as "huge pages," while the traditional AIX® and IBM POWER hardware teams refer to these pages as "large pages." Since terminology in the community and industry is ambiguous, I recommend clarifying the usage with the actual size of the page being used. In the case of this article, I refer to 16MB huge pages and I use the phrases "large pages" and "huge pages" to mean the same thing -- that is, 16MB huge pages.
  • You can use 16MB huge pages in virtualized logical partitions (LPARs) on Linux on POWER or in full system LPARs that have all of the resources assigned to the partition.
  • A special filesystem just for libhugetlbfs should be set up to only allow a specified group of users access to the pool of available huge pages. I provide examples of this later in this article. This is important for security reasons and to control access to a very limited and valuable resource (physical memory) on a system.
  • Linux provides the ability to dynamically manage, specify, request, and return the 16MB huge pages by a root user. Availability of 16MB huge pages is dependent on the system availability of contiguous physical memory when requested by a root user. 16MB huge pages can be reserved at boot time.
  • When requesting large amounts of huge pages, you need to consider memory fragmentation. The longer a system is running, the more likely main memory will become fragmented with bits and pieces of pinned and permanent kernel memory, which makes claiming each individual contiguous 16MB huge page increasingly more difficult. If large amounts of 16MB huge pages is required, it is recommended that you reserve these at system boot time.

Some restrictions to using libhugetlbfs

System huge pages are a limited and valuable system resource. There are several restrictions to consider when using libhugetlbfs, as follows:

  • On Linux, an application that uses libhugetlbfs and huge pages must have all of the huge pages available and free when needed at run time, or the application is terminated at run time when the huge page request fails. This can be an important consideration under these circumstances:
    • The application uses a lot of memory that might require more 16MB huge pages than are available.
    • The system doesn't have enough physical memory to support loading the executable and all of the specified pages.
  • The ability of the operating system to "fall back" to normal pages when it runs out of huge pages is an aspect of the Linux operating system which has proven difficult to implement in a nice generic fashion. Community efforts are continuing, with various prototypes and approaches being considered. As system hardware becomes more flexible with respect to the number of page sizes supported, the hope is that more flexible controls emerge in the operating system.
  • Not all applications or software solutions benefit from using huge pages. The primary performance improvement with leveraging huge pages is the reduction in Translation Lookaside Buffer (TLB) misses. It's generally recommended that you simply try your application with transparent huge pages and see whether that results in any tangible performance gains. Normal performance gains with libhugetlbfs can range from no gain to around 10 percent to 15 percent gain. Some applications actually see performance get worse. Many applications see little to no gain, so this approach is typically only used on specific applications for specific gains.
  • In general, using libhugetlbfs is a specialized niche approach for improving performance on a limited set of applications. Classically, the approach is used for Fortran programs that use very large arrays of data and do a significant number of memory accesses, but the approach applies to any C, C++, or Fortran program.
  • Applications with the .text section (the executable text) loaded into huge pages cannot be effectively profiled (for example, with oprofile) or debugged (for example, with gdb) since the executable itself is copied without symbols to anonymous huge pages and then run. Profiling would be helpful in some cases and work continues in the Linux community to extend this functionality.
  • libhugetlbfs leverages dynamic linking facilities built into Linux. Therefore, statically linked executables cannot take advantage of the capabilities of libhugetlbfs.

Sections in an executable

Before beginning, I should review the sections in a generated executable. I'll demonstrate with a simple do-nothing C program. In the example below, I'll declare one large uninitialized array and one large initialized array.

Listing 1 shows a source file called sections.c.


Listing 1. sections.c source file
#define ELEMENTS 1024*1024
static double  bss_array[ELEMENTS];
static double  data_array[ELEMENTS] = {5.0};
    
int   main()   {
    int   i;   
    for (i = 1; i < ELEMENTS; i++) {
       bss_array[i]    = data_array[i];
    }
}  

The key differences between a bss array and a data array is easy to see in the executable that you generate in the next step. The differences between the bss and data arrays are as follows:

  • bss array -- This array is uninitialized data that is saved in the .bss section in the executable. This section does not take any space in the executable.
  • data array -- This array is initialized data (non-zero) that must be saved in the executable itself in the .data section. In this case above, with each double being 8 bytes, space in the executable is needed for 8*1024*1024 (8MB) bytes.

To see these attributes in the executable, compile the program and then display the section information with the readelf command. You can see this illustrated in Listing 2.


Listing 2. Examples of sections of a C program
# cc  sections.c  -o sections

# ls -l sections
-rwxr-xr-x 1 root root 8400154 2006-12-03 18:52 sections
  
# readelf -S sections   
There are 36 section headers, starting at offset 0x8012cc:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        10000154 000154 00000d 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            10000164 000164 000020 00   A  0   0  4
...
  [12] .text             PROGBITS        100002f0 0002f0 0005b0 00  AX  0   0 16
  [13] .fini             PROGBITS        100008a0 0008a0 000038 00  AX  0   0  4
...
  [23] .plt              PROGBITS        10010a28 000a28 000008 00  WA  0   0  4
  [24] .data             PROGBITS        10010a30 000a30 800008 00  WA  0   0  8
  [25] .bss              NOBITS          10810a38 800a38 800008 00  WA  0   0  8
  [26] .comment          PROGBITS        00000000 800a38 0000d9 00      0   0  1
...

In the output shown in Listing 2, observe the following:

  • Size of the executable -- Notice that the size of the executable is 8400154, which is pretty big for a small program.
  • .text -- The text is the executable instructions as generated by the compiler. For very large programs, libhugetlbfs can be directed to load this section into 16MB huge pages. In this example, the size of the executable text is only 5b0(hex) bytes, so you would not consider placing this into a 16MB huge page.
  • .data -- The data section in this program has a size of 800008(hex) bytes. That corresponds to the declared 8MB data array. It is important to recognize that the initialized data is part of the executable itself, so this makes the executable much larger. You can tell this by looking at the offsets. The data section starts at offset 000a30, and the next section starts at offset 800a38.
  • .bss -- The .bss section in this program also has a size of 800008(hex) bytes, but it takes up no space in the executable. The offset for the start of the .bss section and the following section do not change; both are 800a38(hex).

Changing a data array to a bss array

To demonstrate what compilers often do with the initialization of arrays, change the initialization of data_array from 5.0 to 0.0 and observe that this now becomes a .bss section in the executable, as shown here:

...
#define ELEMENTS 1024*1024
static double  bss_array[ELEMENTS];
static double  data_array[ELEMENTS] = {0.0};  <----  zero'ing makes this a .bss array
...

If you initialize the data_array to zero, the compilers treat it just like the .bss section. You need to recompile -- the result is shown in Listing 3 below. Let's look at the .data section now.


Listing 3. Example of .data section now in .bss section
# cc  sections.c  -o sections

# ls -l sections
-rwxr-xr-x 1 root root 11546 2006-12-03 18:54 sections

# readelf -S sections
There are 36 section headers, starting at offset 0x12cc:

Section Headers:
  [Nr] Name          Type       Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]               NULL       00000000 000000 000000 00      0   0  0
  [ 1] .interp       PROGBITS   10000154 000154 00000d 00   A  0   0  1
  [ 2] .note.ABI-tag NOTE       10000164 000164 000020 00   A  0   0  4
...
  [12] .text         PROGBITS   100002f0 0002f0 0005b0 00  AX  0   0 16
  [13] .fini         PROGBITS   100008a0 0008a0 000038 00  AX  0   0  4
...  
  [23] .plt          PROGBITS   10010a28 000a28 000008 00  WA  0   0  4
  [24] .data         PROGBITS   10010a30 000a30 000008 00  WA  0   0  4  \
                                        <----- No longer a data array
  [25] .bss          NOBITS     10010a38 000a38 1000008 00  WA  0   0  8 \
                                        <----- Now twice the size
  [26] .comment      PROGBITS   00000000 000a38 0000d9 00      0   0  1
...  

Notice the size column and the offsets associated with the size. Here are details describing what has happened:

  • Size of the executable -- The size of the executable is now only 11546 bytes.
  • .data -- In the .data section, the size dropped from 800008(hex) in the previous example down to just 000008(hex). The data_array is now treated as a .bss section.
  • .bss -- In the .bss section, the size is now double from the previous example, that is 1000008(hex), which is the now combined size of the two arrays (800000(hex) + 800000(hex) for the bss_array and data_array), but it doesn't take any "space" in the executable. The offsets in the following section didn't change.

Summarizing the sections

Most applications use either uninitialized data declarations or use malloc commands for large amounts of memory. libhugetlbfs provides two easy ways to load both the .bss section and the memory allocated from a malloc command into system huge pages. libhugetlbfs also provide a way to load the executable .text section and the .data section into huge pages, but this isn't as common as the .bss sections and the memory allocated by a malloc.

Setting up libhugetlbfs

This section covers the procedures for downloading, installing, and setting up libhugetlbfs.

To set up a system with libhugetlbfs and define the system huge pages, you need root access to the system. I'll show ways that a non-root user can be defined to use the huge pages once the system has been set up.

To install libhugetlbfs, there are several ways:

  1. If you are registered with SUSE support, you have access to the latest libhugetlbfs rpms from the Novell SUSE maintweb support. (Note: This Web site is only available to registered users.) At the time of this writing, the latest recommended (and supported) rpms available for SLES 10 as follows:
    • libhugetlbfs-1.0.1-1.4.ppc.rpm
    • libhugetlbfs-64bit-1.0.1-1.4.ppc.rpm

    The default libhugetlbfs libraries that were provided when SLES 10 was initially available are old, and I don't recommend bothering with them. These are:

    • libhugetlbfs-1.0-16.2.ppc.rpm
    • libhugetlbfs-64bit-1.0-16.2.ppc.rpm

    With the recommended libraries installed, proceed to the Allocating huge pages section of this article.

  2. On RHEL 5, there are five rpms to install:
    • libhugetlbfs-1.0.1-1.el5.ppc.rpm
    • libhugetlbfs-devel-1.0.1-1.el5.ppc.rpm
    • libhugetlbfs-devel-1.0.1-1.el5.ppc64.rpm
    • libhugetlbfs-lib-1.0.1-1.el5.ppc.rpm
    • libhugetlbfs-lib-1.0.1-1.el5.ppc64.rpm
  3. Alternatively, you can get a copy from SourceForge (see Resources) and build and install that version. I'll explain this approach next.

Installing from SourceForge

From SourceForge, click the Download libhugetlbfs button, shown in Figure 1 and follow the instructions to download the tar ball. For this article, Version 1.0.1 level is the one tested and supported by SUSE and Red Hat.


Figure 1. SourceForge libhugetlbfs page
SourceForge libhugetlbfs           page

Download the tar ball to your system. The libhugetlbfs make install process copies the required files to the appropriate locations on your system. I recommend installing libhugetlbfs in the system /usr subdirectory by specifying make install PREFIX=/usr, as shown in Listing 4. You'll need root access to install.


Listing 4. Installing libhugetlbfs
# tar -zxf libhugetlbfs-1.0.1.tar.gz
# cd libhugetlbfs-1.0.1
# make
     ...               <---- not showing "make" messages here ... 
# make install PREFIX=/usr
     VERSION
     INSTALL32 /usr/lib
     INSTALL64 /usr/lib64
     OBJSCRIPT ld.hugetlbfs
     INSTALL

If you look in the /usr/share/libhugetlbfs/ directory, you should find ld, which is the new linker command. See Listing 5. The ld command is soft linked to ld.hugetlbfs as a convenience for invoking this linker by the GNU Compiler Collection (GCC) and IBM compilers.


Listing 5. /usr/share/libhugetlbfs/ showing the ld command
# ls -l /usr/share/libhugetlbfs
total 8
lrwxrwxrwx 1 root root   12 2006-11-26 12:42 ld -> ld.hugetlbfs
-rwxr-xr-x 1 root root 1321 2006-11-26 12:42 ld.hugetlbfs
drwxr-xr-x 2 root root 4096 2006-11-26 12:42 ldscripts

The ldscripts subdirectory shown in Listing 6 contains all of the modified linker scripts needed to handle the "magic" of relinking the .bss, .data, and .text sections.


Listing 6. ldscripts subdirectory of /usr/share/libhugetlbfs/
/usr/share/libhugetlbfs/ldscripts # ls
elf32ppclinux.xB    elf64ppc.xB    elf_i386.xB    elf_x86_64.xB
elf32ppclinux.xBDT  elf64ppc.xBDT  elf_i386.xBDT  elf_x86_64.xBDT

The libhugetlbfs package comes with a number of automated test packages that can be invoked using make. These tests are primarily used by the developers of the package and kernel maintainers as changes are made to the system. If you're running on a supported operating system level and kernel, there should be no need to run the tests, especially since the output can be cryptic for general users of the package.

The other important file provided by libhugetlbfs is the libhugetlbfs.so library, as shown in Listing 7. This is the library invoked at run time that controls the system usage of the huge pages. Libraries are provided for both 32-bit applications and 64-bit applications.


Listing 7. libhugetlbfs.so library
# ls -l /usr/lib/libhugetlbfs.so
-rwxr-xr-x 1 root root 54785 2006-11-26 12:42 /usr/lib/libhugetlbfs.so
  
# ls -l /usr/lib64/libhugetlbfs.so
-rwxr-xr-x 1 root root 63910 2006-11-26 12:42 /usr/lib64/libhugetlbfs.so

Allocating huge pages

To set up your system to use huge pages, you simply allocate huge pages first. With Linux, you copy (echo) the value of the number of huge pages you'd like to allocate into a /proc/sys/vm control, as shown in Listing 8. The operating system attempts to allocate the requested huge pages. Allocating means to tuck the pages away and reserve them out of the physical memory pool. The pages are not yet in use. The system does not give any indication that it has failed to allocate all of the requested huge pages. So after you request them, you always need to check how many were actually reserved.


Listing 8. Allocating huge pages with /proc/sys/vm
# cat /proc/meminfo | grep Huge
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:    16384 kB
    
# echo 200 > /proc/sys/vm/nr_hugepages
      
# cat /proc/meminfo | grep Huge
HugePages_Total:   200
HugePages_Free:    200
HugePages_Rsvd:      0
Hugepagesize:    16384 kB

Define a hugetlbfs filesystem

libhugetlbfs uses a virtual filesystem interface, the hugetlbfs. To set this up, create a mount point and mount the virtual filesystem. The filesystem can be named anything unique and, in this case, use the mount point libhugetlbfs.

The hugetlbfs you defined controls who has access to the system huge pages, so it is important to set up a group with the users that you wish to have access to the system huge pages. Note that you create a special group (libhuge) on the system to control access to the hugetlbfs. See Listing 9. You should add any authorized users to this group, and only these users will be allowed access to the huge pages used by libhugetlbfs.


Listing 9. Defining hugetlbfs filesystem
# mkdir /libhugetlbfs
  
# groupadd libhuge
# chgrp libhuge /libhugetlbfs
# chmod 770 /libhugetlbfs

# usermod wmb  -G libhuge    <---- assumes "wmb" is a user on the system

# mount  -t hugetlbfs  hugetlbfs  /libhugetlbfs

The Linux operating system uses this special filesystem for access and control of the physical memory associated with the 16MB huge pages. The name of the file system you create isn't important and, for normal usage, you should define and use a single filesystem. Keep in mind the "mount" of the hugetlbfs is temporary, and it will not be automatically remounted at boot time unless you change /etc/fstab. To make the change permanent, edit the file and add the applicable line, like this:

# vi /etc/fstab

Then add that line to the file, as shown in Listing 10. Note the gid=1000assumes that the libhuge group ID is 1000.


Listing 10. Adding libhugetlbfs mount to /etc/fstab
/dev/sda3            /                    ext3       acl,user_xattr        1 1
/dev/sda2            swap                 swap       defaults              0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/fd0             /media/floppy        auto       noauto,user,sync      0 0
hugetlbfs            /libhugetlbfs        hugetlbfs  mode=0770,gid=1000    0 0

libhugetlbfs is ready to go

So at this point, you've completed the basic setup of the libhugetlbfs library. You have done the following:

  • Reviewed the considerations for using libhugetlbfs
  • Downloaded and installed libhugetlbfs (either from SourceForge or the most current rpm files from your SLES 10 or RHEL 5 distribution)
  • Set up libhugetlbfs by creating, mounting, and limiting access to a virtual hugetlbfs
  • Defined a huge page pool to be used by users

Now you're ready for a user to use the transparent huge pages.

Backing malloc commands with huge pages

Let's start by looking at malloc, which is the easy case and requires no special linking. I assume you're running as a user (in this example, wmb), who is included in the previously defined libhuge group. The mounted libhugetlbfs filesystem has libhuge as the group owner, and wmb belongs to that group. This is shown in Listing 11.


Listing 11. The libhuge group
wmb@p5sys:~> ls -l -d /libhugetlbfs/
drwxrwx--- 2 root libhuge 0 2006-12-18 21:26 /libhugetlbfs/

wmb@p5sys:~> id
uid=1000(wmb) gid=100(users) groups=16(dialout),33(video),100(users),1000(libhuge)

Then use a simple program that copies values from one very big array to another array. You'll create a program called copy_arrays.c, which is shown in Listing 12.


Listing 12. copy_arrays.c
#define ELEMENTS 1024*1024*128
static double  bss_array_from[ELEMENTS];
static double  bss_array_to[ELEMENTS];
static double *malloc_array_from;
static double *malloc_array_to;

int   main()   {
    int   i;
    malloc_array_from = (double *)malloc(ELEMENTS*sizeof(double));
    malloc_array_to   = (double *)malloc(ELEMENTS*sizeof(double));

    /* initialize and touch all of the pages */
    for (i = 1; i < ELEMENTS; i++) {
       bss_array_to[i]      = 1.0;
       bss_array_from[i]    = 2.0;
       malloc_array_to[i]   = 3.0;
       malloc_array_from[i] = 4.0;
    }

    /* copy "from" "to" */
    for (i = 1; i < ELEMENTS; i++) {
       bss_array_to[i]    = bss_array_from[i];
       malloc_array_to[i] = malloc_array_from[i];
    }
    return;
}

As root user, allocate 200 huge pages for the system to use. Then open a second window and issue the watch command to keep track of the system usage of huge pages, as follows:

# echo 200 > /proc/sys/vm/nr_hugepages
# watch cat /proc/meminfo

As a non-root user (in this case, wmb), you need to compile and run the program, as shown in Listing 13. The number of huge pages being "watched" should not change. Your "times" might be different, depending on the system you are running on.


Listing 13. Running copy_arrays.c
wmb@p5sys:~> cc -O3 -m64 copy_arrays.c -o copy_arrays

wmb@p5sys:~> time ./copy_arrays

real    0m6.844s
user    0m2.149s
sys     0m2.649s

To back the malloc commands with huge pages, set the HUGETLB_MORECORE environment variable and set the LD_PRELOAD to the libhugetlbfs library, as shown in Listing 14.The program execution should take place slightly faster, and the window where you're watching memory usage should show huge pages being allocated and used.


Listing 14. Setting HUGETLB_MORECORE with LD_PRELOAD set to libhugetlbfs
wmb@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so  ./copy_arrays

real    0m6.480s
user    0m2.483s
sys     0m1.645s

Depending on your system and memory configuration, your result might vary but, in this case, the performance should improve by about five percent (or (6.480-6.844)/6.844). While this might seem to be a trivial amount of time, this can add up when very large compute-intensive workloads are run over the course of days.

Default stack size on SLES 10

On SLES 10, the default stack limit is reduced to only 8192. So if your application is unexpectedly encountering seg faults, check to see if the ulimit is set to 8192. Setting the stack limit to unlimited, such as ulimit -s unlimited, might circumvent the problem.

The LD_PRELOAD setting causes any malloc commands or their derivatives to be automatically backed by the huge pages allocated earlier. Malloc commands are handled at run time, so you'll need to predefine enough 16MB pages to be used by your program.

Backing sections with huge pages

When re-linking your executable to take advantage of backing the .bss, .data, or .text sections, it's easy to use either the GCC or the IBM compilers. Both compilers provide easy-to-use parameters to override the invocation of the ld command. Here's how to use them:

  • GCC compilers: Add the following parameters to the C compile link command line:
    -B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B


    The second parameter is 'dash W ell comma' -- hard to read on most browsers.
  • IBM compilers: Do the same thing, but provide additional clarification of what's being overridden, like this:
    -B /usr/share/libhugetlbfs/  -tl  -Wl,--hugetlbfs-link=BDT


    This adds 'dash tee ell', which directs this directive to the linker. Use this for both C, C++, and Fortran compilers.

So using the copy_arrays.c program (Listing 12) and the GCC compilers, tell the linker to put the .bss section into huge pages. Note that in this case, you need to create a new target executable called copy_arrays-lp, as shown in Listing 15, since this executable is linked for huge pages. You should see a performance gain similar to that of backing the malloc pages.


Listing 15. copy_arrays-lp executable
wmb@p5sys:~> cc -O3 -m64 -B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B \ 
                   copy_arrays.c -o copy_arrays-lp
                   
wmb@p5sys:~> time ./copy_arrays-lp

real    0m6.495s
user    0m2.558s
sys     0m1.648s

What if not enough 16MB pages are available?

You can combine the two, backing both the .bss section and malloc with huge pages. You'll need more huge pages than you previously defined.

Note that when you don't have enough huge pages defined, the application is terminated ("Killed"). As you see in Listing 16, the invocation of copy_arrays_lp with malloc also used will fail due to not having enough 16MB huge pages available. This is fixed by allocating more 16MB huge pages and then rerunning.

When using an executable linked for the.bss section, there is no need to specify the LD_PRELOAD=libhugetlbfs.so override. You simply specify that you want to use malloc as well by defining HUGETLB_MORECORE=yes.


Listing 16. Specifying use of malloc with HUGETLB_MORECORE
wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp
Killed                                     <---- Note the "Killed" message

real    0m4.084s
user    0m1.669s
sys     0m0.365s

wmb@p5sys:~> su
#> echo 300 > /proc/sys/vm/nr_hugepages
#> exit

wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp

real    0m6.943s
user    0m3.600s
sys     0m0.473s

In this case, using both malloc and .bss sections did not result in tangible gains. For your applications, trying various combinations helps you understand the most practical approach to use. Not all applications benefit from using transparent huge pages.

Unauthorized user

If an unauthorized user tries to use the system huge pages, the program runs normally, but the libhugetlbfs library returns an error -- huge pages will not be used. For example, if the user joe tries to run the same two programs from above, the result shown in Listing 17 occurs.


Listing 17. Result of unauthorized user trying to use huge pages
joe@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp
libhugetlbfs: ERROR: mkstemp() failed: Permission denied

real    0m6.795s
user    0m2.267s
sys     0m2.656s

joe@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so ./copy_arrays
libhugetlbfs: ERROR: mkstemp() failed: Permission denied
libhugetlbfs: ERROR: Couldn't open hugetlbfs file for morecore

real    0m6.807s
user    0m2.133s
sys     0m2.660s

Examples of libhugetlbfs in action

The libhugetlbfs library has been put to use in customer production environments, where it has helped to tune workloads across the POWER line and has been used in SPECcpu2000 publishes with SLES 10. More recently, it's been used for a SPECompM2001 publish with the new RHEL 5 version. In addition, a working practical example of how to tune a memory application has been provided on the IBM Linux on POWER wiki (see Resources).

Publishing SPECcpu2000 with SLES 10

To provide working examples of the technology in action, IBM recently published a number of SPECcpu2000 benchmarks on the SPEC.org Web site (see Resources) on POWER5+ systems using SLES 10. The published examples demonstrate how easy it is to invoke and use libhugetlbfs. For example, on the SPECint2000 runs, the executables are built normally. Then, when the runs are measured for performance, the malloc backing of the executables is simply turned on by setting the two applicable environment variables, as shown in Listing 18.


Listing 18. Running SPECint2000 with malloc commands in 16MB huge pages
# export HUGETLB_MORECORE=yes
# export LD_PRELOAD=libhugetlbfs.so
# runspec --config LoP_config_file.cfg  int

For the SPECcpu2000 Fortran publishes, you need to leverage the common approach of re-linking the applicable programs to have the executable setup to back the .bss, .data, and .text sections with 16MB pages. In this case, modify the link step of the components. In the example (Listing 19), I show the changes for the 179.art component, as published on the SPEC.org Web site (see Resources).


Listing 19. Example of a SPECfp2000 component compile/link flags
  179.art=peak=default=default:
  notes179_1       = 179.art
  notes179_2       =     +FDO -O5 
  notes179_3       =     -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
  
  PASS1_CFLAGS     = -qpdf1 -O5  -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
  PASS1_LDCFLAGS   = -qpdf1 -O5  -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
  PASS2_CFLAGS     = -qpdf2 -O5  -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
  PASS2_LDCFLAGS   = -qpdf2 -O5  -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT

Then invoke the run, as follows:

# export HUGETLB_MORECORE=yes
# runspec --config LoP_config_file.cfg  fp 

Publishing SPECompM2001 with RHEL 5

Demonstrating that the approach is available and the same using RHEL 5, a recent SPECompM2001 publish was completed on an IBM System p5 560Q system. The result is available on the SPEC.org Web site (see Resources).

Selected components that showed performance gains with malloc commands being backed by 16MB huge pages were run with the environment variables set appropriately. See Listing 20 for an example of how the invocation of one of the components was modified. It looks easy, right?


Listing 20. Example of SPECompM2001 component using malloc in 16MB pages
316.applu_m: -qpdf1/pdf2
             -O4 -q64
             ENV_HUGETLB_MORECORE=yes
             ENV_LD_PRELOAD=libhugetlbfs.so

An application tuning example

Another exercise in tuning a memory application (stream) with libhugetlbfs is available on the Web in the IBM Linux on POWER wiki (see Resources).

The example in the wiki shows the practical and common steps a user can take to maximize performance of memory-intensive workloads on a system running Linux.

In summary

In this article, you were introduced to an open source community project called libhugetlbfs, which is now supported on SLES 10 and now in RHEL 5. The library provides easier transparent access for customer applications to system huge pages. You learned how to use libhugetlbfs libraries with GCC or the IBM XL C/C++ compilers for Linux. The libhugetlbfs library allows you to leverage the huge pages for malloc commands in a program, the .bss section, or the combination of .bss, .data, and .text sections of an executable, improving the performance of the application in some cases. You can easily configure libhugetlbfs on any system, and access controls are easily included to limit access to the memory resources.

Acknowledgments

Particular thanks to Adam Litke (in Rochester, MN), Nishanth Aravamudan (Beaverton, OR), Steve Fox (Rochester, MN), and David Gibson (Australia) for their tireless work and focus on delivering libhugetlbfs to the community as a supported library in 2006. Their work continues into 2007, as they make the library more flexible and robust across system platforms.

Thanks to Chakarat Skawratananond (Austin, TX), a regular contributor to developerWorks, on his help and assistance in working with libhugetlbfs on the performance improvements for customers.



Downloads

DescriptionNameSizeDownload method
SourceForge libhugetlbfs librarylibhugetlbfs-1.0.1.tar.gz64KBHTTP
Trial version of XL C/C++ Advanced Edition V8.01vac.rpm95MBHTTP

Information about download methods

Note

  1. A functional version of the IBM C/C++ compilers for the POWER systems.

Resources

Learn

  • Tuning stream with libhugetlbfs: See a working example on the IBM Linux on POWER Wiki site and the common steps an end user can take to maximize performance on Linux on POWER systems.

  • SPEC.org SPECcpu2000 results: In this section, there are a number of examples of publishes done with libhugetlbfs. Click on one of the results sections and search for "SLES."

  • SUSE for POWER: Visit the Novell site and get the latest information for SUSE for POWER.

  • Red Hat Enterprise Linux documentation: Visit the Red Hat site for more information on Red Hat Enterprise Linux.

  • IBM Systems: Want more? The developerWorks IBM Systems zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials.

  • New to IBM Systems?: Get a better understanding of what skills are necessary to undertake systems development with a wide range of IBM hardware products and related technologies.

  • Safari bookstore: Visit this e-reference library to find specific technical resources.

Get products and technologies

Discuss

About the author

Photo of Bill Buros

Bill Buros works is a performance analyst in the IBM Linux Technology Center. Based in Austin, Texas, Bill has been focused on performance for compute-intensive applications running on the Linux operating system. He works with customers and software product vendors to improve performance of their products, the Linux operating system, the kernel, and the overall software stack. One of his recent goals has been to get transparent huge pages implemented in the operating system for customers of Linux. You can contact him at wburos@us.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=211107
ArticleTitle=Leverage transparent huge pages on Linux on POWER
publish-date=04202007
author1-email=wburos@us.ibm.com
author1-email-cc=