Transparently leveraging huge pages on Linux® -- which allow memory page table entries to cover larger (up to multiple megabytes) ranges of contiguous physical memory -- has become much easier with the recent introduction of Version 1 of the libhugetlbfs library on SourceForge (see Resources). The libhugetlbfs library has been updated for SUSE Linux Enterprise Server 10 (SLES 10) and is available for Red Hat Enterprise Server Linux 5 (RHEL 5). Customers use it for benchmarking activities to improve select applications on POWER, Intel®, and AMD systems with Linux. With a focus on IBM POWER processor-based systems with 16MB page sizes, this article provides an introduction to libhugetlbfs, including the following:
- Considerations of when and when not to use libhugetlbfs
- Instructions on how to install and set up libhugetlbfs
- Information on providing system access control to the huge pages
- A simple code example for backing
mallocand applicationbsssections with huge pages - Examples of industry-standard external publishes using libhugetlbfs
For Linux on POWER systems, the libhugetlbfs library is supported today on SLES 10 and RHEL 5 systems where 16MB pages are available. This includes POWER4, POWER5, POWER5+ systems, and BladeCenter® JS20 and BladeCenter JS21.
While the focus of this article is using POWER systems, libhugetlbfs support is available on Linux for Intel and AMD-based systems that support huge pages. For those systems, the hardware page sizes are different but the approach is the same, and testing that has shown good performance improvements can be seen on those Linux systems as well.
No source code changes to your application
In this article, transparently leveraging huge pages means that applications can take advantage of the performance advantages of the larger hardware page sizes with no source code changes. Linux already supports exploiting system huge pages, but applications have to be specifically coded to take advantage of the feature. Examples of this support exists in general system software products like the latest Java™ engines and the various large database vendor products.
If you use libhugetlbfs on 32-bit or 64-bit applications with the platforms listed above, your application requires no source code changes to take advantage of 16MB huge pages. You have several options available to you, which I describe below:
- Backing
.bss,.data, and.textsections with huge pages: You can specify that you want all three sections (the.bsssection, the.datasection, and the.textsection) loaded into huge pages. The.bsssection is the uninitialized global data structures in a program (for example, a large Fortran array). The.bsssection can be very large, with many compute intensive workloads declaring and using Fortran arrays that can use many gigabytes of memory. The.datasection is the initialized global variables and data structures (rarely that big of a piece) of an application program, and the.textsection as the executable text (the binary code) itself. To specify that these application sections be backed by system huge pages, all you need to do is relink your executable.
- Backing just the
.bsssection: A common use of libhugetlbfs for Fortran programs is to specify that just the.bsssection be backed by system huge pages. The.datasection and the.textsections continue to be backed by normal hardware base page sizes. As with the first approach, using libhugetlbfs requires a simple relink of the executable. - Backing
mallocwith huge pages: You can specify that you want run timemalloccalls to use memory backed by 16MB pages. This includes themallocderivatives (calloc,valloc,realloc, and so forth) as the underlyingmallocinvocation is the system call being overridden. The libhugetlbfs library automatically manages all of themallocrequests in the 16MB pages. You can use this approach with any executable and it does not need a specially linked executable.
Before you actually use libhugetlbfs, there are some key considerations you need to keep in mind, as follows:
- For the purposes of developing an application that is supported in a customer environment, you need a shipping and supported distribution of Linux. For SUSE, this is SLES 10. For Red Hat, this is Red Hat Enterprise Linux 5 (RHEL 5) that includes the libhugetlbfs libraries. SLES 9 and RHEL 4 are not covered in this article, since they do not come with the correct kernel support for libhugetlbfs. Experienced Linux users can always get a recent mainline kernel that provides the libhugetlbfs kernel functionality required, but this would not be a customer supported configuration.
- Huge pages for Linux on POWER are the 16MB huge pages that are defined prior to usage (either on boot or by a root user) and are pinned by the operating system in main memory when they are used. 16MB huge pages on POWER systems are contiguous pieces of 16MB physical memory.
- On POWER systems, the Linux community refers to these pages as "huge pages," while the traditional AIX® and IBM POWER hardware teams refer to these pages as "large pages." Since terminology in the community and industry is ambiguous, I recommend clarifying the usage with the actual size of the page being used. In the case of this article, I refer to 16MB huge pages and I use the phrases "large pages" and "huge pages" to mean the same thing -- that is, 16MB huge pages.
- You can use 16MB huge pages in virtualized logical partitions (LPARs) on Linux on POWER or in full system LPARs that have all of the resources assigned to the partition.
- A special filesystem just for libhugetlbfs should be set up to only allow a specified group of users access to the pool of available huge pages. I provide examples of this later in this article. This is important for security reasons and to control access to a very limited and valuable resource (physical memory) on a system.
- Linux provides the ability to dynamically manage, specify, request, and return the 16MB huge pages by a root user. Availability of 16MB huge pages is dependent on the system availability of contiguous physical memory when requested by a root user. 16MB huge pages can be reserved at boot time.
- When requesting large amounts of huge pages, you need to consider memory fragmentation. The longer a system is running, the more likely main memory will become fragmented with bits and pieces of pinned and permanent kernel memory, which makes claiming each individual contiguous 16MB huge page increasingly more difficult. If large amounts of 16MB huge pages is required, it is recommended that you reserve these at system boot time.
Some restrictions to using libhugetlbfs
System huge pages are a limited and valuable system resource. There are several restrictions to consider when using libhugetlbfs, as follows:
- On Linux, an application that uses libhugetlbfs and huge pages must have all
of the huge pages available and free when needed at run time, or the application
is terminated at run time when the huge page request fails. This can be an
important consideration under these circumstances:
- The application uses a lot of memory that might require more 16MB huge pages than are available.
- The system doesn't have enough physical memory to support loading the executable and all of the specified pages.
- The ability of the operating system to "fall back" to normal pages when it runs out of huge pages is an aspect of the Linux operating system which has proven difficult to implement in a nice generic fashion. Community efforts are continuing, with various prototypes and approaches being considered. As system hardware becomes more flexible with respect to the number of page sizes supported, the hope is that more flexible controls emerge in the operating system.
- Not all applications or software solutions benefit from using huge pages. The primary performance improvement with leveraging huge pages is the reduction in Translation Lookaside Buffer (TLB) misses. It's generally recommended that you simply try your application with transparent huge pages and see whether that results in any tangible performance gains. Normal performance gains with libhugetlbfs can range from no gain to around 10 percent to 15 percent gain. Some applications actually see performance get worse. Many applications see little to no gain, so this approach is typically only used on specific applications for specific gains.
- In general, using libhugetlbfs is a specialized niche approach for improving performance on a limited set of applications. Classically, the approach is used for Fortran programs that use very large arrays of data and do a significant number of memory accesses, but the approach applies to any C, C++, or Fortran program.
- Applications with the
.textsection (the executable text) loaded into huge pages cannot be effectively profiled (for example, withoprofile) or debugged (for example, withgdb) since the executable itself is copied without symbols to anonymous huge pages and then run. Profiling would be helpful in some cases and work continues in the Linux community to extend this functionality. - libhugetlbfs leverages dynamic linking facilities built into Linux. Therefore, statically linked executables cannot take advantage of the capabilities of libhugetlbfs.
Before beginning, I should review the sections in a generated executable. I'll demonstrate with a simple do-nothing C program. In the example below, I'll declare one large uninitialized array and one large initialized array.
Listing 1 shows a source file called
sections.c.
Listing 1. sections.c source file
#define ELEMENTS 1024*1024
static double bss_array[ELEMENTS];
static double data_array[ELEMENTS] = {5.0};
int main() {
int i;
for (i = 1; i < ELEMENTS; i++) {
bss_array[i] = data_array[i];
}
}
|
The key differences between a bss array and a
data array is easy to see in the executable that you
generate in the next step. The differences between the
bss and data arrays are as
follows:
- bss array -- This array is uninitialized data that is saved in the
.bsssection in the executable. This section does not take any space in the executable. - data array -- This array is initialized data (non-zero) that must be
saved in the executable itself in the
.datasection. In this case above, with each double being 8 bytes, space in the executable is needed for 8*1024*1024 (8MB) bytes.
To see these attributes in the executable, compile the program and then display
the section information with the readelf command. You
can see this illustrated in Listing 2.
Listing 2. Examples of sections of a C program
# cc sections.c -o sections # ls -l sections -rwxr-xr-x 1 root root 8400154 2006-12-03 18:52 sections # readelf -S sections There are 36 section headers, starting at offset 0x8012cc: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 10000154 000154 00000d 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 10000164 000164 000020 00 A 0 0 4 ... [12] .text PROGBITS 100002f0 0002f0 0005b0 00 AX 0 0 16 [13] .fini PROGBITS 100008a0 0008a0 000038 00 AX 0 0 4 ... [23] .plt PROGBITS 10010a28 000a28 000008 00 WA 0 0 4 [24] .data PROGBITS 10010a30 000a30 800008 00 WA 0 0 8 [25] .bss NOBITS 10810a38 800a38 800008 00 WA 0 0 8 [26] .comment PROGBITS 00000000 800a38 0000d9 00 0 0 1 ... |
In the output shown in Listing 2, observe the following:
- Size of the executable -- Notice that the size of the executable is 8400154, which is pretty big for a small program.
- .text -- The text is the executable instructions as generated by the compiler. For very large programs, libhugetlbfs can be directed to load this section into 16MB huge pages. In this example, the size of the executable text is only 5b0(hex) bytes, so you would not consider placing this into a 16MB huge page.
- .data -- The data section in this program has a size of 800008(hex)
bytes. That corresponds to the declared 8MB
dataarray. It is important to recognize that the initialized data is part of the executable itself, so this makes the executable much larger. You can tell this by looking at the offsets. The data section starts at offset000a30, and the next section starts at offset800a38. - .bss -- The
.bsssection in this program also has a size of 800008(hex) bytes, but it takes up no space in the executable. The offset for the start of the.bsssection and the following section do not change; both are800a38(hex).
Changing a data array to a bss array
To demonstrate what compilers often do with the initialization of arrays, change
the initialization of data_array from 5.0 to 0.0 and
observe that this now becomes a .bss section in the
executable, as shown here:
...
#define ELEMENTS 1024*1024
static double bss_array[ELEMENTS];
static double data_array[ELEMENTS] = {0.0}; <---- zero'ing makes this a .bss array
...
|
If you initialize the data_array to zero, the
compilers treat it just like the .bss section. You need
to recompile -- the result is shown in Listing 3 below. Let's
look at the .data section now.
Listing 3. Example of .data section now in .bss section
# cc sections.c -o sections
# ls -l sections
-rwxr-xr-x 1 root root 11546 2006-12-03 18:54 sections
# readelf -S sections
There are 36 section headers, starting at offset 0x12cc:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 10000154 000154 00000d 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 10000164 000164 000020 00 A 0 0 4
...
[12] .text PROGBITS 100002f0 0002f0 0005b0 00 AX 0 0 16
[13] .fini PROGBITS 100008a0 0008a0 000038 00 AX 0 0 4
...
[23] .plt PROGBITS 10010a28 000a28 000008 00 WA 0 0 4
[24] .data PROGBITS 10010a30 000a30 000008 00 WA 0 0 4 \
<----- No longer a data array
[25] .bss NOBITS 10010a38 000a38 1000008 00 WA 0 0 8 \
<----- Now twice the size
[26] .comment PROGBITS 00000000 000a38 0000d9 00 0 0 1
...
|
Notice the size column and the offsets associated with the size. Here are details describing what has happened:
- Size of the executable -- The size of the executable is now only 11546 bytes.
- .data -- In the
.datasection, the size dropped from 800008(hex) in the previous example down to just 000008(hex). Thedata_arrayis now treated as a.bsssection. - .bss -- In the
.bsssection, the size is now double from the previous example, that is 1000008(hex), which is the now combined size of the two arrays (800000(hex) + 800000(hex) for thebss_arrayanddata_array), but it doesn't take any "space" in the executable. The offsets in the following section didn't change.
Most applications use either uninitialized data declarations or use
malloc commands for large amounts of memory.
libhugetlbfs provides two easy ways to load both the
.bss section and the memory allocated from a
malloc command into system huge pages. libhugetlbfs
also provide a way to load the executable .text section
and the .data section into huge pages, but this isn't
as common as the .bss sections and the memory allocated
by a malloc.
This section covers the procedures for downloading, installing, and setting up libhugetlbfs.
To set up a system with libhugetlbfs and define the system huge pages, you need root access to the system. I'll show ways that a non-root user can be defined to use the huge pages once the system has been set up.
To install libhugetlbfs, there are several ways:
- If you are registered with SUSE support, you have access to the latest
libhugetlbfs rpms from the Novell SUSE maintweb support. (Note: This Web
site is only available to registered users.) At the time of this writing, the
latest recommended (and supported) rpms available for SLES 10 as follows:
- libhugetlbfs-1.0.1-1.4.ppc.rpm
- libhugetlbfs-64bit-1.0.1-1.4.ppc.rpm
The default libhugetlbfs libraries that were provided when SLES 10 was initially available are old, and I don't recommend bothering with them. These are:
- libhugetlbfs-1.0-16.2.ppc.rpm
- libhugetlbfs-64bit-1.0-16.2.ppc.rpm
With the recommended libraries installed, proceed to the Allocating huge pages section of this article.
- On RHEL 5, there are five rpms to install:
- libhugetlbfs-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-devel-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-devel-1.0.1-1.el5.ppc64.rpm
- libhugetlbfs-lib-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-lib-1.0.1-1.el5.ppc64.rpm
- Alternatively, you can get a copy from SourceForge (see Resources) and build and install that version. I'll explain this approach next.
From SourceForge, click the Download libhugetlbfs button, shown in Figure 1 and follow the instructions to download the tar ball. For this article, Version 1.0.1 level is the one tested and supported by SUSE and Red Hat.
Figure 1. SourceForge libhugetlbfs page

Download the tar ball to your system. The libhugetlbfs
make install process copies the required files to the
appropriate locations on your system. I recommend installing libhugetlbfs in the
system /usr subdirectory by specifying
make install PREFIX=/usr, as shown in
Listing 4. You'll need root access to install.
Listing 4. Installing libhugetlbfs
# tar -zxf libhugetlbfs-1.0.1.tar.gz
# cd libhugetlbfs-1.0.1
# make
... <---- not showing "make" messages here ...
# make install PREFIX=/usr
VERSION
INSTALL32 /usr/lib
INSTALL64 /usr/lib64
OBJSCRIPT ld.hugetlbfs
INSTALL
|
If you look in the /usr/share/libhugetlbfs/ directory,
you should find ld, which is the new linker command.
See Listing 5. The ld command is
soft linked to ld.hugetlbfs as a convenience for
invoking this linker by the GNU Compiler Collection (GCC) and IBM compilers.
Listing 5. /usr/share/libhugetlbfs/ showing the ld command
# ls -l /usr/share/libhugetlbfs total 8 lrwxrwxrwx 1 root root 12 2006-11-26 12:42 ld -> ld.hugetlbfs -rwxr-xr-x 1 root root 1321 2006-11-26 12:42 ld.hugetlbfs drwxr-xr-x 2 root root 4096 2006-11-26 12:42 ldscripts |
The ldscripts subdirectory shown in
Listing 6 contains all of the modified linker scripts needed
to handle the "magic" of relinking the .bss,
.data, and .text sections.
Listing 6. ldscripts subdirectory of /usr/share/libhugetlbfs/
/usr/share/libhugetlbfs/ldscripts # ls elf32ppclinux.xB elf64ppc.xB elf_i386.xB elf_x86_64.xB elf32ppclinux.xBDT elf64ppc.xBDT elf_i386.xBDT elf_x86_64.xBDT |
The libhugetlbfs package comes with a number of automated test packages that can
be invoked using make. These tests are primarily used
by the developers of the package and kernel maintainers as changes are made to the
system. If you're running on a supported operating system level and kernel, there
should be no need to run the tests, especially since the output can be cryptic for
general users of the package.
The other important file provided by libhugetlbfs is the
libhugetlbfs.so library, as shown in
Listing 7. This is the library invoked at run time that
controls the system usage of the huge pages. Libraries are provided for both
32-bit applications and 64-bit applications.
Listing 7. libhugetlbfs.so library
# ls -l /usr/lib/libhugetlbfs.so -rwxr-xr-x 1 root root 54785 2006-11-26 12:42 /usr/lib/libhugetlbfs.so # ls -l /usr/lib64/libhugetlbfs.so -rwxr-xr-x 1 root root 63910 2006-11-26 12:42 /usr/lib64/libhugetlbfs.so |
To set up your system to use huge pages, you simply allocate huge pages first.
With Linux, you copy (echo) the value of the number of huge pages you'd like to
allocate into a /proc/sys/vm control, as shown in
Listing 8. The operating system attempts to allocate the
requested huge pages. Allocating means to tuck the pages away and reserve
them out of the physical memory pool. The pages are not yet in use. The system
does not give any indication that it has failed to allocate all of the requested
huge pages. So after you request them, you always need to check how many were
actually reserved.
Listing 8. Allocating huge pages with /proc/sys/vm
# cat /proc/meminfo | grep Huge
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
# echo 200 > /proc/sys/vm/nr_hugepages
# cat /proc/meminfo | grep Huge
HugePages_Total: 200
HugePages_Free: 200
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
|
libhugetlbfs uses a virtual filesystem interface, the
hugetlbfs. To set this up, create a mount point and
mount the virtual filesystem. The filesystem can be named anything unique and, in
this case, use the mount point libhugetlbfs.
The hugetlbfs you defined controls who has access to
the system huge pages, so it is important to set up a group with the users that
you wish to have access to the system huge pages. Note that you create a special
group (libhuge) on the system to control access to the
hugetlbfs. See Listing 9. You
should add any authorized users to this group, and only these users will be
allowed access to the huge pages used by libhugetlbfs.
Listing 9. Defining hugetlbfs filesystem
# mkdir /libhugetlbfs # groupadd libhuge # chgrp libhuge /libhugetlbfs # chmod 770 /libhugetlbfs # usermod wmb -G libhuge <---- assumes "wmb" is a user on the system # mount -t hugetlbfs hugetlbfs /libhugetlbfs |
The Linux operating system uses this special filesystem for access and control of
the physical memory associated with the 16MB huge pages. The name of the file
system you create isn't important and, for normal usage, you should define and use
a single filesystem. Keep in mind the "mount" of the
hugetlbfs is temporary, and it will not be
automatically remounted at boot time unless you change
/etc/fstab. To make the change permanent, edit the file
and add the applicable line, like this:
# vi /etc/fstab |
Then add that line to the file, as shown in Listing 10. Note the
gid=1000assumes that the libhuge group ID is
1000.Listing 10. Adding libhugetlbfs mount to /etc/fstab
/dev/sda3 / ext3 acl,user_xattr 1 1 /dev/sda2 swap swap defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0 hugetlbfs /libhugetlbfs hugetlbfs mode=0770,gid=1000 0 0 |
So at this point, you've completed the basic setup of the libhugetlbfs library. You have done the following:
- Reviewed the considerations for using libhugetlbfs
- Downloaded and installed libhugetlbfs (either from SourceForge or the most current rpm files from your SLES 10 or RHEL 5 distribution)
- Set up libhugetlbfs by creating, mounting, and limiting access to a virtual
hugetlbfs - Defined a huge page pool to be used by users
Now you're ready for a user to use the transparent huge pages.
Backing malloc commands with huge pages
Let's start by looking at malloc, which is the easy
case and requires no special linking. I assume you're running as a user (in this
example, wmb), who is included in the previously
defined libhuge group. The mounted libhugetlbfs
filesystem has libhuge as the group owner, and
wmb belongs to that group. This is shown in
Listing 11.
Listing 11. The libhuge group
wmb@p5sys:~> ls -l -d /libhugetlbfs/ drwxrwx--- 2 root libhuge 0 2006-12-18 21:26 /libhugetlbfs/ wmb@p5sys:~> id uid=1000(wmb) gid=100(users) groups=16(dialout),33(video),100(users),1000(libhuge) |
Then use a simple program that copies values from one very big array to another
array. You'll create a program called copy_arrays.c,
which is shown in Listing 12.
Listing 12. copy_arrays.c
#define ELEMENTS 1024*1024*128
static double bss_array_from[ELEMENTS];
static double bss_array_to[ELEMENTS];
static double *malloc_array_from;
static double *malloc_array_to;
int main() {
int i;
malloc_array_from = (double *)malloc(ELEMENTS*sizeof(double));
malloc_array_to = (double *)malloc(ELEMENTS*sizeof(double));
/* initialize and touch all of the pages */
for (i = 1; i < ELEMENTS; i++) {
bss_array_to[i] = 1.0;
bss_array_from[i] = 2.0;
malloc_array_to[i] = 3.0;
malloc_array_from[i] = 4.0;
}
/* copy "from" "to" */
for (i = 1; i < ELEMENTS; i++) {
bss_array_to[i] = bss_array_from[i];
malloc_array_to[i] = malloc_array_from[i];
}
return;
}
|
As root user, allocate 200 huge pages for the system to use. Then open a second
window and issue the watch command to keep track of the
system usage of huge pages, as follows:
# echo 200 > /proc/sys/vm/nr_hugepages # watch cat /proc/meminfo |
As a non-root user (in this case, wmb), you need to compile and run the program, as shown in Listing 13. The number of huge pages being "watched" should not change. Your "times" might be different, depending on the system you are running on.
Listing 13. Running copy_arrays.c
wmb@p5sys:~> cc -O3 -m64 copy_arrays.c -o copy_arrays wmb@p5sys:~> time ./copy_arrays real 0m6.844s user 0m2.149s sys 0m2.649s |
To back the malloc commands with huge pages, set the
HUGETLB_MORECORE environment variable and set the
LD_PRELOAD to the libhugetlbfs library, as shown in
Listing 14.The program execution should take place slightly
faster, and the window where you're watching memory usage should show huge pages
being allocated and used.
Listing 14. Setting HUGETLB_MORECORE with LD_PRELOAD set to libhugetlbfs
wmb@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so ./copy_arrays real 0m6.480s user 0m2.483s sys 0m1.645s |
Depending on your system and memory configuration, your result might vary but, in this case, the performance should improve by about five percent (or (6.480-6.844)/6.844). While this might seem to be a trivial amount of time, this can add up when very large compute-intensive workloads are run over the course of days.
The LD_PRELOAD setting causes any
malloc commands or their derivatives to be
automatically backed by the huge pages allocated earlier.
Malloc commands are handled at run time, so you'll need
to predefine enough 16MB pages to be used by your program.
Backing sections with huge pages
When re-linking your executable to take advantage of backing the
.bss, .data, or
.text sections, it's easy to use either the GCC or the
IBM compilers. Both compilers provide easy-to-use parameters to override the
invocation of the ld command. Here's how to use them:
- GCC compilers: Add the following parameters to the C compile link command
line:
-B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B
The second parameter is 'dash W ell comma' -- hard to read on most browsers. - IBM compilers: Do the same thing, but provide additional clarification of
what's being overridden, like this:
-B /usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
This adds 'dash tee ell', which directs this directive to the linker. Use this for both C, C++, and Fortran compilers.
So using the copy_arrays.c program
(Listing 12) and the GCC compilers, tell the linker to put
the .bss section into huge pages. Note that in this
case, you need to create a new target executable called
copy_arrays-lp, as shown in
Listing 15, since this executable is linked for huge pages.
You should see a performance gain similar to that of backing the
malloc pages.
Listing 15. copy_arrays-lp executable
wmb@p5sys:~> cc -O3 -m64 -B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B \
copy_arrays.c -o copy_arrays-lp
wmb@p5sys:~> time ./copy_arrays-lp
real 0m6.495s
user 0m2.558s
sys 0m1.648s
|
What if not enough 16MB pages are available?
You can combine the two, backing both the .bss section
and malloc with huge pages. You'll need more huge pages
than you previously defined.
Note that when you don't have enough huge pages defined, the application is
terminated ("Killed"). As you see in Listing 16, the
invocation of copy_arrays_lp with
malloc also used will fail due to not having enough
16MB huge pages available. This is fixed by allocating more 16MB huge pages and
then rerunning.
When using an executable linked for the.bss section,
there is no need to specify the
LD_PRELOAD=libhugetlbfs.so override. You simply specify
that you want to use malloc as well by defining
HUGETLB_MORECORE=yes.
Listing 16. Specifying use of malloc with HUGETLB_MORECORE
wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp Killed <---- Note the "Killed" message real 0m4.084s user 0m1.669s sys 0m0.365s wmb@p5sys:~> su #> echo 300 > /proc/sys/vm/nr_hugepages #> exit wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp real 0m6.943s user 0m3.600s sys 0m0.473s |
In this case, using both malloc and
.bss sections did not result in tangible gains. For
your applications, trying various combinations helps you understand the most
practical approach to use. Not all applications benefit from using transparent
huge pages.
If an unauthorized user tries to use the system huge pages, the program runs
normally, but the libhugetlbfs library returns an error -- huge pages will not be
used. For example, if the user joe tries to run the
same two programs from above, the result shown in Listing 17
occurs.
Listing 17. Result of unauthorized user trying to use huge pages
joe@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp libhugetlbfs: ERROR: mkstemp() failed: Permission denied real 0m6.795s user 0m2.267s sys 0m2.656s joe@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so ./copy_arrays libhugetlbfs: ERROR: mkstemp() failed: Permission denied libhugetlbfs: ERROR: Couldn't open hugetlbfs file for morecore real 0m6.807s user 0m2.133s sys 0m2.660s |
Examples of libhugetlbfs in action
The libhugetlbfs library has been put to use in customer production environments, where it has helped to tune workloads across the POWER line and has been used in SPECcpu2000 publishes with SLES 10. More recently, it's been used for a SPECompM2001 publish with the new RHEL 5 version. In addition, a working practical example of how to tune a memory application has been provided on the IBM Linux on POWER wiki (see Resources).
Publishing SPECcpu2000 with SLES 10
To provide working examples of the technology in action, IBM recently published a
number of SPECcpu2000 benchmarks on the SPEC.org Web site (see
Resources) on POWER5+ systems using SLES 10. The
published examples demonstrate how easy it is to invoke and use libhugetlbfs. For
example, on the SPECint2000 runs, the executables are built normally. Then, when
the runs are measured for performance, the malloc
backing of the executables is simply turned on by setting the two applicable
environment variables, as shown in Listing 18.
Listing 18. Running SPECint2000 with malloc commands in 16MB huge pages
# export HUGETLB_MORECORE=yes # export LD_PRELOAD=libhugetlbfs.so # runspec --config LoP_config_file.cfg int |
For the SPECcpu2000 Fortran publishes, you need to leverage the common approach
of re-linking the applicable programs to have the executable setup to back the
.bss, .data, and
.text sections with 16MB pages. In this case, modify
the link step of the components. In the example
(Listing
19), I show the changes for the
179.art component, as published on the SPEC.org Web
site (see Resources).
Listing 19. Example of a SPECfp2000 component compile/link flags
179.art=peak=default=default: notes179_1 = 179.art notes179_2 = +FDO -O5 notes179_3 = -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT PASS1_CFLAGS = -qpdf1 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT PASS1_LDCFLAGS = -qpdf1 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT PASS2_CFLAGS = -qpdf2 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT PASS2_LDCFLAGS = -qpdf2 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT |
Then invoke the run, as follows:
# export HUGETLB_MORECORE=yes # runspec --config LoP_config_file.cfg fp |
Publishing SPECompM2001 with RHEL 5
Demonstrating that the approach is available and the same using RHEL 5, a recent SPECompM2001 publish was completed on an IBM System p5 560Q system. The result is available on the SPEC.org Web site (see Resources).
Selected components that showed performance gains with
malloc commands being backed by 16MB huge pages were
run with the environment variables set appropriately. See
Listing 20 for an example of how the invocation of one of
the components was modified. It looks easy, right?
Listing 20. Example of SPECompM2001 component using malloc in 16MB pages
316.applu_m: -qpdf1/pdf2
-O4 -q64
ENV_HUGETLB_MORECORE=yes
ENV_LD_PRELOAD=libhugetlbfs.so
|
Another exercise in tuning a memory application (stream) with libhugetlbfs is available on the Web in the IBM Linux on POWER wiki (see Resources).
The example in the wiki shows the practical and common steps a user can take to maximize performance of memory-intensive workloads on a system running Linux.
In this article, you were introduced to an open source community project called
libhugetlbfs, which is now supported on SLES 10 and now in RHEL 5. The library
provides easier transparent access for customer applications to system huge pages.
You learned how to use libhugetlbfs libraries with GCC or the IBM XL C/C++
compilers for Linux. The libhugetlbfs library allows you to leverage the huge
pages for malloc commands in a program, the
.bss section, or the combination of
.bss, .data, and
.text sections of an executable, improving the
performance of the application in some cases. You can easily configure
libhugetlbfs on any system, and access controls are easily included to limit
access to the memory resources.
Particular thanks to Adam Litke (in Rochester, MN), Nishanth Aravamudan (Beaverton, OR), Steve Fox (Rochester, MN), and David Gibson (Australia) for their tireless work and focus on delivering libhugetlbfs to the community as a supported library in 2006. Their work continues into 2007, as they make the library more flexible and robust across system platforms.
Thanks to Chakarat Skawratananond (Austin, TX), a regular contributor to developerWorks, on his help and assistance in working with libhugetlbfs on the performance improvements for customers.
| Description | Name | Size | Download method |
|---|---|---|---|
| SourceForge libhugetlbfs library | libhugetlbfs-1.0.1.tar.gz | 64KB | HTTP |
| Trial version of XL C/C++ Advanced Edition V8.01 | vac.rpm | 95MB | HTTP |
Information about download methods
Note
- A functional version of the IBM C/C++ compilers for the POWER systems.
Learn
- Tuning
stream with libhugetlbfs:
See a working example on the IBM Linux on POWER Wiki site and the common steps an
end user can take to maximize performance on Linux on POWER systems.
- SPEC.org SPECcpu2000 results:
In this section, there are a number of examples of publishes done with
libhugetlbfs. Click on one of the results sections and search for "SLES."
- SUSE for POWER:
Visit the Novell site and get the latest information for SUSE for POWER.
- Red Hat Enterprise Linux
documentation:
Visit the Red Hat site for more information on Red Hat Enterprise Linux.
- IBM Systems: Want more?
The developerWorks IBM Systems zone hosts hundreds of informative articles and
introductory, intermediate, and advanced tutorials.
- New to IBM Systems?:
Get a better understanding of what skills are necessary to undertake systems
development with a wide range of IBM hardware products and related technologies.
- Safari bookstore:
Visit this e-reference library to find specific technical resources.
Get products and technologies
- IBM trial software:
Build your next development project with software for download directly from
developerWorks.
- IBM VisualAge C/C++
compiler:
Visit the VisualAge C++ page for updated information on the IBM complier.
Discuss
- Participate in the discussion forum.
- IBM Systems forums:
Exchange information with other developers on the IBM Systems forums.
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Check out
SourceForge and
get involved in the
libhugetlbfs community.

Bill Buros works is a performance analyst in the IBM Linux Technology Center. Based in Austin, Texas, Bill has been focused on performance for compute-intensive applications running on the Linux operating system. He works with customers and software product vendors to improve performance of their products, the Linux operating system, the kernel, and the overall software stack. One of his recent goals has been to get transparent huge pages implemented in the operating system for customers of Linux. You can contact him at wburos@us.ibm.com.




