 | Level: Intermediate Bill Buros (wburos@us.ibm.com), LTC Linux Performance, IBM
20 Apr 2007 Learn more about the libhugetlbfs libraries and how to use them with the GNU
Compiler Collection (GCC) or the IBM XL C/C++ and XL Fortran compilers for Linux®. libhugetlbfs is an open source community project that provides transparent
access for customer applications to system huge pages. SUSE Linux Enterprise Server
10 (SLES 10) and Red Hat Enterprise Server Linux 5 (RHEL 5) now support
libhugetlbfs. While the libhugetlbfs support is available for a number of hardware
platforms that support Linux huge pages, this article focuses on the 16MB huge page
support available on IBM POWER processor-based systems.
Introduction and
background
Transparently leveraging huge pages on Linux® -- which allow memory page
table entries to cover larger (up to multiple megabytes) ranges of contiguous
physical memory -- has become much easier with the recent introduction of Version
1 of the libhugetlbfs library on SourceForge (see
Resources). The libhugetlbfs library has been updated for
SUSE Linux Enterprise Server 10 (SLES 10) and is available for Red Hat Enterprise
Server Linux 5 (RHEL 5). Customers use it for benchmarking activities to improve
select applications on POWER, Intel®, and AMD systems with Linux. With a
focus on IBM POWER processor-based systems with 16MB page sizes, this article
provides an introduction to libhugetlbfs, including the following:
- Considerations of when and when not to use libhugetlbfs
- Instructions on how to install and set up libhugetlbfs
- Information on providing system access control to the huge pages
- A simple code example for backing
malloc and
application bss sections with huge pages
- Examples of industry-standard external publishes using libhugetlbfs
Supported systems
For Linux on POWER systems, the libhugetlbfs library is supported today on SLES
10 and RHEL 5 systems where 16MB pages are available. This includes POWER4,
POWER5, POWER5+ systems, and BladeCenter® JS20 and BladeCenter JS21.
While the focus of this article is using POWER systems, libhugetlbfs support is
available on Linux for Intel and AMD-based systems that support huge pages. For
those systems, the hardware page sizes are different but the approach is the same,
and testing that has shown good performance improvements can be seen on those
Linux systems as well.
No source code changes
to your application
In this article, transparently leveraging huge pages means that
applications can take advantage of the performance advantages of the larger
hardware page sizes with no source code changes. Linux already supports
exploiting system huge pages, but applications have to be specifically coded to
take advantage of the feature. Examples of this support exists in general system
software products like the latest Java™ engines and the various large
database vendor products.
If you use libhugetlbfs on 32-bit or 64-bit applications with the platforms
listed above, your application requires no source code changes to take advantage
of 16MB huge pages. You have several options available to you, which I describe
below:
-
Backing
.bss, .data,
and .text sections with huge pages: You can
specify that you want all three sections (the .bss
section, the .data section, and the
.text section) loaded into huge pages. The
.bss section is the uninitialized global data
structures in a program (for example, a large Fortran array). The
.bss section can be very large, with many compute
intensive workloads declaring and using Fortran arrays that can use many
gigabytes of memory. The .data section is the
initialized global variables and data structures (rarely that big of a piece) of
an application program, and the .text section as the
executable text (the binary code) itself. To specify that these application
sections be backed by system huge pages, all you need to do is relink your
executable.
 |
Looking at your executable
To see sections defined in your POWER executable, try the following command:
readelf --sections -W yourApp. This shows you
the sections defined in the executable. Look for the
.bss, .data, and
.text sections in the output. For more information,
see the Sections in an executable section later in this
article.
|
|
-
Backing just the
.bss section: A common use of
libhugetlbfs for Fortran programs is to specify that just the
.bss section be backed by system huge pages. The
.data section and the
.text sections continue to be backed by normal
hardware base page sizes. As with the first approach, using libhugetlbfs
requires a simple relink of the executable.
-
Backing
malloc with huge pages: You can
specify that you want run time malloc calls to use
memory backed by 16MB pages. This includes the malloc
derivatives (calloc,
valloc, realloc, and so
forth) as the underlying malloc invocation is the
system call being overridden. The libhugetlbfs library automatically manages all
of the malloc requests in the 16MB pages. You can use
this approach with any executable and it does not need a specially linked
executable.
Key considerations
Before you actually use libhugetlbfs, there are some key considerations you need
to keep in mind, as follows:
- For the purposes of developing an application that is supported in a customer
environment, you need a shipping and supported distribution of Linux. For SUSE,
this is SLES 10. For Red Hat, this is Red Hat Enterprise Linux 5 (RHEL 5) that
includes the libhugetlbfs libraries. SLES 9 and RHEL 4 are not covered in this
article, since they do not come with the correct kernel support for
libhugetlbfs. Experienced Linux users can always get a recent mainline kernel
that provides the libhugetlbfs kernel functionality required, but this would not
be a customer supported configuration.
- Huge pages for Linux on POWER are the 16MB huge pages that are defined prior
to usage (either on boot or by a root user) and are pinned by the operating
system in main memory when they are used. 16MB huge pages on POWER systems are
contiguous pieces of 16MB physical memory.
- On POWER systems, the Linux community refers to these pages as "huge pages,"
while the traditional AIX® and IBM POWER hardware teams refer to these
pages as "large pages." Since terminology in the community and industry is
ambiguous, I recommend clarifying the usage with the actual size of the page
being used. In the case of this article, I refer to 16MB huge pages and I use
the phrases "large pages" and "huge pages" to mean the same thing -- that is,
16MB huge pages.
- You can use 16MB huge pages in virtualized logical partitions (LPARs) on Linux
on POWER or in full system LPARs that have all of the resources assigned to the
partition.
- A special filesystem just for libhugetlbfs should be set up to only allow a
specified group of users access to the pool of available huge pages. I provide
examples of this later in this article. This is important for security reasons
and to control access to a very limited and valuable resource (physical memory)
on a system.
- Linux provides the ability to dynamically manage, specify, request, and return
the 16MB huge pages by a root user. Availability of 16MB huge pages is dependent
on the system availability of contiguous physical memory when requested by a
root user. 16MB huge pages can be reserved at boot time.
- When requesting large amounts of huge pages, you need to consider memory
fragmentation. The longer a system is running, the more likely main memory will
become fragmented with bits and pieces of pinned and permanent kernel memory,
which makes claiming each individual contiguous 16MB huge page increasingly more
difficult. If large amounts of 16MB huge pages is required, it is recommended
that you reserve these at system boot time.
Some restrictions to
using libhugetlbfs
System huge pages are a limited and valuable system resource. There are several
restrictions to consider when using libhugetlbfs, as follows:
- On Linux, an application that uses libhugetlbfs and huge pages must have all
of the huge pages available and free when needed at run time, or the application
is terminated at run time when the huge page request fails. This can be an
important consideration under these circumstances:
- The application uses a lot of memory that might require more 16MB huge
pages than are available.
- The system doesn't have enough physical memory to support loading the
executable and all of the specified pages.
- The ability of the operating system to "fall back" to normal pages when it
runs out of huge pages is an aspect of the Linux operating system which has
proven difficult to implement in a nice generic fashion. Community efforts are
continuing, with various prototypes and approaches being considered. As system
hardware becomes more flexible with respect to the number of page sizes
supported, the hope is that more flexible controls emerge in the operating
system.
- Not all applications or software solutions benefit from using huge pages. The
primary performance improvement with leveraging huge pages is the reduction in
Translation Lookaside Buffer (TLB) misses. It's generally recommended that you
simply try your application with transparent huge pages and see whether that
results in any tangible performance gains. Normal performance gains with
libhugetlbfs can range from no gain to around 10 percent to 15 percent gain.
Some applications actually see performance get worse. Many applications see
little to no gain, so this approach is typically only used on specific
applications for specific gains.
- In general, using libhugetlbfs is a specialized niche approach for improving
performance on a limited set of applications. Classically, the approach is used
for Fortran programs that use very large arrays of data and do a significant
number of memory accesses, but the approach applies to any C, C++, or Fortran
program.
- Applications with the
.text section (the executable
text) loaded into huge pages cannot be effectively profiled (for example, with
oprofile) or debugged (for example, with
gdb) since the executable itself is copied without
symbols to anonymous huge pages and then run. Profiling would be helpful in some
cases and work continues in the Linux community to extend this functionality.
- libhugetlbfs leverages dynamic linking facilities built into Linux. Therefore,
statically linked executables cannot take advantage of the capabilities of
libhugetlbfs.
Sections in an
executable
Before beginning, I should review the sections in a generated executable. I'll
demonstrate with a simple do-nothing C program. In the example below, I'll declare
one large uninitialized array and one large initialized array.
Listing 1 shows a source file called
sections.c.
Listing 1. sections.c source file
#define ELEMENTS 1024*1024
static double bss_array[ELEMENTS];
static double data_array[ELEMENTS] = {5.0};
int main() {
int i;
for (i = 1; i < ELEMENTS; i++) {
bss_array[i] = data_array[i];
}
}
|
The key differences between a bss array and a
data array is easy to see in the executable that you
generate in the next step. The differences between the
bss and data arrays are as
follows:
-
bss array -- This array is uninitialized data that is saved in the
.bss section in the executable. This section does not
take any space in the executable.
-
data array -- This array is initialized data (non-zero) that must be
saved in the executable itself in the
.data section.
In this case above, with each double being 8 bytes, space in the executable is
needed for 8*1024*1024 (8MB) bytes.
To see these attributes in the executable, compile the program and then display
the section information with the readelf command. You
can see this illustrated in Listing 2.
Listing 2. Examples of sections of a C program
# cc sections.c -o sections
# ls -l sections
-rwxr-xr-x 1 root root 8400154 2006-12-03 18:52 sections
# readelf -S sections
There are 36 section headers, starting at offset 0x8012cc:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 10000154 000154 00000d 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 10000164 000164 000020 00 A 0 0 4
...
[12] .text PROGBITS 100002f0 0002f0 0005b0 00 AX 0 0 16
[13] .fini PROGBITS 100008a0 0008a0 000038 00 AX 0 0 4
...
[23] .plt PROGBITS 10010a28 000a28 000008 00 WA 0 0 4
[24] .data PROGBITS 10010a30 000a30 800008 00 WA 0 0 8
[25] .bss NOBITS 10810a38 800a38 800008 00 WA 0 0 8
[26] .comment PROGBITS 00000000 800a38 0000d9 00 0 0 1
...
|
In the output shown in Listing 2, observe the following:
-
Size of the executable -- Notice that the size of the executable is
8400154, which is pretty big for a small program.
-
.text -- The text is the executable instructions as generated by the
compiler. For very large programs, libhugetlbfs can be directed to load this
section into 16MB huge pages. In this example, the size of the executable text
is only 5b0(hex) bytes, so you would not consider placing this into a 16MB huge
page.
-
.data -- The data section in this program has a size of 800008(hex)
bytes. That corresponds to the declared 8MB
data
array. It is important to recognize that the initialized data is part of the
executable itself, so this makes the executable much larger. You can tell this
by looking at the offsets. The data section starts at offset
000a30, and the next section starts at offset
800a38.
-
.bss -- The
.bss section in this program also
has a size of 800008(hex) bytes, but it takes up no space in the executable. The
offset for the start of the .bss section and the
following section do not change; both are
800a38(hex).
Changing a data array to a bss array
To demonstrate what compilers often do with the initialization of arrays, change
the initialization of data_array from 5.0 to 0.0 and
observe that this now becomes a .bss section in the
executable, as shown here:
...
#define ELEMENTS 1024*1024
static double bss_array[ELEMENTS];
static double data_array[ELEMENTS] = {0.0}; <---- zero'ing makes this a .bss array
...
|
If you initialize the data_array to zero, the
compilers treat it just like the .bss section. You need
to recompile -- the result is shown in Listing 3 below. Let's
look at the .data section now.
Listing 3. Example of .data section now in .bss section
# cc sections.c -o sections
# ls -l sections
-rwxr-xr-x 1 root root 11546 2006-12-03 18:54 sections
# readelf -S sections
There are 36 section headers, starting at offset 0x12cc:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 10000154 000154 00000d 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 10000164 000164 000020 00 A 0 0 4
...
[12] .text PROGBITS 100002f0 0002f0 0005b0 00 AX 0 0 16
[13] .fini PROGBITS 100008a0 0008a0 000038 00 AX 0 0 4
...
[23] .plt PROGBITS 10010a28 000a28 000008 00 WA 0 0 4
[24] .data PROGBITS 10010a30 000a30 000008 00 WA 0 0 4 \
<----- No longer a data array
[25] .bss NOBITS 10010a38 000a38 1000008 00 WA 0 0 8 \
<----- Now twice the size
[26] .comment PROGBITS 00000000 000a38 0000d9 00 0 0 1
...
|
Notice the size column and the offsets associated with the size. Here are details
describing what has happened:
-
Size of the executable -- The size of the executable is now only 11546
bytes.
-
.data -- In the
.data section, the size
dropped from 800008(hex) in the previous example down to just 000008(hex). The
data_array is now treated as a
.bss section.
-
.bss -- In the
.bss section, the size is now
double from the previous example, that is 1000008(hex), which is the now
combined size of the two arrays (800000(hex) + 800000(hex) for the
bss_array and data_array),
but it doesn't take any "space" in the executable. The offsets in the following
section didn't change.
Summarizing the sections
Most applications use either uninitialized data declarations or use
malloc commands for large amounts of memory.
libhugetlbfs provides two easy ways to load both the
.bss section and the memory allocated from a
malloc command into system huge pages. libhugetlbfs
also provide a way to load the executable .text section
and the .data section into huge pages, but this isn't
as common as the .bss sections and the memory allocated
by a malloc.
Setting up libhugetlbfs
This section covers the procedures for downloading, installing, and setting up
libhugetlbfs.
To set up a system with libhugetlbfs and define the system huge pages, you need
root access to the system. I'll show ways that a non-root user can be defined to
use the huge pages once the system has been set up.
To install libhugetlbfs, there are several ways:
- If you are registered with SUSE support, you have access to the latest
libhugetlbfs rpms from the Novell SUSE maintweb support. (Note: This Web
site is only available to registered users.) At the time of this writing, the
latest recommended (and supported) rpms available for SLES 10 as follows:
- libhugetlbfs-1.0.1-1.4.ppc.rpm
- libhugetlbfs-64bit-1.0.1-1.4.ppc.rpm
The default libhugetlbfs libraries that were provided when SLES 10 was
initially available are old, and I don't recommend bothering with them. These
are:
- libhugetlbfs-1.0-16.2.ppc.rpm
- libhugetlbfs-64bit-1.0-16.2.ppc.rpm
With the recommended libraries installed, proceed to the
Allocating huge pages section of this article.
- On RHEL 5, there are five rpms to install:
- libhugetlbfs-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-devel-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-devel-1.0.1-1.el5.ppc64.rpm
- libhugetlbfs-lib-1.0.1-1.el5.ppc.rpm
- libhugetlbfs-lib-1.0.1-1.el5.ppc64.rpm
- Alternatively, you can get a copy from SourceForge (see
Resources) and build and install that version. I'll
explain this approach next.
Installing from SourceForge
From SourceForge, click the Download libhugetlbfs button, shown in
Figure 1 and follow the instructions to download the tar ball.
For this article, Version 1.0.1 level is the one tested and supported by SUSE and
Red Hat.
Figure 1. SourceForge libhugetlbfs
page
Download the tar ball to your system. The libhugetlbfs
make install process copies the required files to the
appropriate locations on your system. I recommend installing libhugetlbfs in the
system /usr subdirectory by specifying
make install PREFIX=/usr, as shown in
Listing 4. You'll need root access to install.
Listing 4. Installing libhugetlbfs
# tar -zxf libhugetlbfs-1.0.1.tar.gz
# cd libhugetlbfs-1.0.1
# make
... <---- not showing "make" messages here ...
# make install PREFIX=/usr
VERSION
INSTALL32 /usr/lib
INSTALL64 /usr/lib64
OBJSCRIPT ld.hugetlbfs
INSTALL
|
If you look in the /usr/share/libhugetlbfs/ directory,
you should find ld, which is the new linker command.
See Listing 5. The ld command is
soft linked to ld.hugetlbfs as a convenience for
invoking this linker by the GNU Compiler Collection (GCC) and IBM compilers.
Listing 5. /usr/share/libhugetlbfs/ showing the ld command
# ls -l /usr/share/libhugetlbfs
total 8
lrwxrwxrwx 1 root root 12 2006-11-26 12:42 ld -> ld.hugetlbfs
-rwxr-xr-x 1 root root 1321 2006-11-26 12:42 ld.hugetlbfs
drwxr-xr-x 2 root root 4096 2006-11-26 12:42 ldscripts
|
The ldscripts subdirectory shown in
Listing 6 contains all of the modified linker scripts needed
to handle the "magic" of relinking the .bss,
.data, and .text sections.
Listing 6. ldscripts subdirectory of /usr/share/libhugetlbfs/
/usr/share/libhugetlbfs/ldscripts # ls
elf32ppclinux.xB elf64ppc.xB elf_i386.xB elf_x86_64.xB
elf32ppclinux.xBDT elf64ppc.xBDT elf_i386.xBDT elf_x86_64.xBDT
|
The libhugetlbfs package comes with a number of automated test packages that can
be invoked using make. These tests are primarily used
by the developers of the package and kernel maintainers as changes are made to the
system. If you're running on a supported operating system level and kernel, there
should be no need to run the tests, especially since the output can be cryptic for
general users of the package.
The other important file provided by libhugetlbfs is the
libhugetlbfs.so library, as shown in
Listing 7. This is the library invoked at run time that
controls the system usage of the huge pages. Libraries are provided for both
32-bit applications and 64-bit applications.
Listing 7. libhugetlbfs.so library
# ls -l /usr/lib/libhugetlbfs.so
-rwxr-xr-x 1 root root 54785 2006-11-26 12:42 /usr/lib/libhugetlbfs.so
# ls -l /usr/lib64/libhugetlbfs.so
-rwxr-xr-x 1 root root 63910 2006-11-26 12:42 /usr/lib64/libhugetlbfs.so
|
Allocating huge
pages
To set up your system to use huge pages, you simply allocate huge pages first.
With Linux, you copy (echo) the value of the number of huge pages you'd like to
allocate into a /proc/sys/vm control, as shown in
Listing 8. The operating system attempts to allocate the
requested huge pages. Allocating means to tuck the pages away and reserve
them out of the physical memory pool. The pages are not yet in use. The system
does not give any indication that it has failed to allocate all of the requested
huge pages. So after you request them, you always need to check how many were
actually reserved.
Listing 8. Allocating huge pages with /proc/sys/vm
# cat /proc/meminfo | grep Huge
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
# echo 200 > /proc/sys/vm/nr_hugepages
# cat /proc/meminfo | grep Huge
HugePages_Total: 200
HugePages_Free: 200
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
|
Define a hugetlbfs
filesystem
libhugetlbfs uses a virtual filesystem interface, the
hugetlbfs. To set this up, create a mount point and
mount the virtual filesystem. The filesystem can be named anything unique and, in
this case, use the mount point libhugetlbfs.
The hugetlbfs you defined controls who has access to
the system huge pages, so it is important to set up a group with the users that
you wish to have access to the system huge pages. Note that you create a special
group (libhuge) on the system to control access to the
hugetlbfs. See Listing 9. You
should add any authorized users to this group, and only these users will be
allowed access to the huge pages used by libhugetlbfs.
Listing 9. Defining hugetlbfs filesystem
# mkdir /libhugetlbfs
# groupadd libhuge
# chgrp libhuge /libhugetlbfs
# chmod 770 /libhugetlbfs
# usermod wmb -G libhuge <---- assumes "wmb" is a user on the system
# mount -t hugetlbfs hugetlbfs /libhugetlbfs
|
The Linux operating system uses this special filesystem for access and control of
the physical memory associated with the 16MB huge pages. The name of the file
system you create isn't important and, for normal usage, you should define and use
a single filesystem. Keep in mind the "mount" of the
hugetlbfs is temporary, and it will not be
automatically remounted at boot time unless you change
/etc/fstab. To make the change permanent, edit the file
and add the applicable line, like this:
Then add that line to the
file, as shown in Listing 10. Note the
gid=1000assumes that the libhuge group ID is
1000.
Listing 10. Adding libhugetlbfs mount to /etc/fstab
/dev/sda3 / ext3 acl,user_xattr 1 1
/dev/sda2 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0
hugetlbfs /libhugetlbfs hugetlbfs mode=0770,gid=1000 0 0
|
libhugetlbfs is ready to
go
So at this point, you've completed the basic setup of the libhugetlbfs library.
You have done the following:
- Reviewed the considerations for using libhugetlbfs
- Downloaded and installed libhugetlbfs (either from SourceForge or the most
current rpm files from your SLES 10 or RHEL 5 distribution)
- Set up libhugetlbfs by creating, mounting, and limiting access to a virtual
hugetlbfs
- Defined a huge page pool to be used by users
Now you're ready for a user to use the transparent huge pages.
Backing malloc commands
with huge pages
Let's start by looking at malloc, which is the easy
case and requires no special linking. I assume you're running as a user (in this
example, wmb), who is included in the previously
defined libhuge group. The mounted libhugetlbfs
filesystem has libhuge as the group owner, and
wmb belongs to that group. This is shown in
Listing 11.
Listing 11. The libhuge group
wmb@p5sys:~> ls -l -d /libhugetlbfs/
drwxrwx--- 2 root libhuge 0 2006-12-18 21:26 /libhugetlbfs/
wmb@p5sys:~> id
uid=1000(wmb) gid=100(users) groups=16(dialout),33(video),100(users),1000(libhuge)
|
Then use a simple program that copies values from one very big array to another
array. You'll create a program called copy_arrays.c,
which is shown in Listing 12.
Listing 12. copy_arrays.c
#define ELEMENTS 1024*1024*128
static double bss_array_from[ELEMENTS];
static double bss_array_to[ELEMENTS];
static double *malloc_array_from;
static double *malloc_array_to;
int main() {
int i;
malloc_array_from = (double *)malloc(ELEMENTS*sizeof(double));
malloc_array_to = (double *)malloc(ELEMENTS*sizeof(double));
/* initialize and touch all of the pages */
for (i = 1; i < ELEMENTS; i++) {
bss_array_to[i] = 1.0;
bss_array_from[i] = 2.0;
malloc_array_to[i] = 3.0;
malloc_array_from[i] = 4.0;
}
/* copy "from" "to" */
for (i = 1; i < ELEMENTS; i++) {
bss_array_to[i] = bss_array_from[i];
malloc_array_to[i] = malloc_array_from[i];
}
return;
}
|
As root user, allocate 200 huge pages for the system to use. Then open a second
window and issue the watch command to keep track of the
system usage of huge pages, as follows:
# echo 200 > /proc/sys/vm/nr_hugepages
# watch cat /proc/meminfo
|
As a non-root user (in this case, wmb), you need to compile and run the
program, as shown in Listing 13. The number of huge pages
being "watched" should not change. Your "times" might be different, depending on
the system you are running on.
Listing 13. Running copy_arrays.c
wmb@p5sys:~> cc -O3 -m64 copy_arrays.c -o copy_arrays
wmb@p5sys:~> time ./copy_arrays
real 0m6.844s
user 0m2.149s
sys 0m2.649s
|
To back the malloc commands with huge pages, set the
HUGETLB_MORECORE environment variable and set the
LD_PRELOAD to the libhugetlbfs library, as shown in
Listing 14.The program execution should take place slightly
faster, and the window where you're watching memory usage should show huge pages
being allocated and used.
Listing 14. Setting HUGETLB_MORECORE with LD_PRELOAD set to libhugetlbfs
wmb@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so ./copy_arrays
real 0m6.480s
user 0m2.483s
sys 0m1.645s
|
Depending on your system and memory configuration, your result might vary but, in
this case, the performance should improve by about five percent (or
(6.480-6.844)/6.844). While this might seem to be a trivial amount of time, this
can add up when very large compute-intensive workloads are run over the course of
days.
 |
Default stack size on SLES 10 On SLES
10, the default stack limit is reduced to only 8192. So if your application is
unexpectedly encountering seg faults, check to see if
the ulimit is set to 8192. Setting the stack limit to
unlimited, such as ulimit -s unlimited, might
circumvent the problem. |
|
The LD_PRELOAD setting causes any
malloc commands or their derivatives to be
automatically backed by the huge pages allocated earlier.
Malloc commands are handled at run time, so you'll need
to predefine enough 16MB pages to be used by your program.
Backing sections with
huge pages
When re-linking your executable to take advantage of backing the
.bss, .data, or
.text sections, it's easy to use either the GCC or the
IBM compilers. Both compilers provide easy-to-use parameters to override the
invocation of the ld command. Here's how to use them:
-
GCC compilers: Add the following parameters to the C compile link command
line:
-B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B
|
The second parameter is 'dash W ell comma' -- hard to read on most
browsers.
-
IBM compilers: Do the same thing, but provide additional clarification of
what's being overridden, like this:
-B /usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
|
This adds 'dash tee ell', which directs this directive to the linker. Use
this for both C, C++, and Fortran compilers.
So using the copy_arrays.c program
(Listing 12) and the GCC compilers, tell the linker to put
the .bss section into huge pages. Note that in this
case, you need to create a new target executable called
copy_arrays-lp, as shown in
Listing 15, since this executable is linked for huge pages.
You should see a performance gain similar to that of backing the
malloc pages.
Listing 15. copy_arrays-lp executable
wmb@p5sys:~> cc -O3 -m64 -B /usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=B \
copy_arrays.c -o copy_arrays-lp
wmb@p5sys:~> time ./copy_arrays-lp
real 0m6.495s
user 0m2.558s
sys 0m1.648s
|
What if not enough 16MB
pages are available?
You can combine the two, backing both the .bss section
and malloc with huge pages. You'll need more huge pages
than you previously defined.
Note that when you don't have enough huge pages defined, the application is
terminated ("Killed"). As you see in Listing 16, the
invocation of copy_arrays_lp with
malloc also used will fail due to not having enough
16MB huge pages available. This is fixed by allocating more 16MB huge pages and
then rerunning.
When using an executable linked for the.bss section,
there is no need to specify the
LD_PRELOAD=libhugetlbfs.so override. You simply specify
that you want to use malloc as well by defining
HUGETLB_MORECORE=yes.
Listing 16. Specifying use of malloc with HUGETLB_MORECORE
wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp
Killed <---- Note the "Killed" message
real 0m4.084s
user 0m1.669s
sys 0m0.365s
wmb@p5sys:~> su
#> echo 300 > /proc/sys/vm/nr_hugepages
#> exit
wmb@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp
real 0m6.943s
user 0m3.600s
sys 0m0.473s
|
In this case, using both malloc and
.bss sections did not result in tangible gains. For
your applications, trying various combinations helps you understand the most
practical approach to use. Not all applications benefit from using transparent
huge pages.
Unauthorized user
If an unauthorized user tries to use the system huge pages, the program runs
normally, but the libhugetlbfs library returns an error -- huge pages will not be
used. For example, if the user joe tries to run the
same two programs from above, the result shown in Listing 17
occurs.
Listing 17. Result of unauthorized user trying to use huge pages
joe@p5sys:~> time HUGETLB_MORECORE=yes ./copy_arrays-lp
libhugetlbfs: ERROR: mkstemp() failed: Permission denied
real 0m6.795s
user 0m2.267s
sys 0m2.656s
joe@p5sys:~> time HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so ./copy_arrays
libhugetlbfs: ERROR: mkstemp() failed: Permission denied
libhugetlbfs: ERROR: Couldn't open hugetlbfs file for morecore
real 0m6.807s
user 0m2.133s
sys 0m2.660s
|
Examples of libhugetlbfs
in action
The libhugetlbfs library has been put to use in customer production environments,
where it has helped to tune workloads across the POWER line and has been used in
SPECcpu2000 publishes with SLES 10. More recently, it's been used for a
SPECompM2001 publish with the new RHEL 5 version. In addition, a working practical
example of how to tune a memory application has been provided on the IBM Linux on
POWER wiki (see Resources).
Publishing
SPECcpu2000 with SLES 10
To provide working examples of the technology in action, IBM recently published a
number of SPECcpu2000 benchmarks on the SPEC.org Web site (see
Resources) on POWER5+ systems using SLES 10. The
published examples demonstrate how easy it is to invoke and use libhugetlbfs. For
example, on the SPECint2000 runs, the executables are built normally. Then, when
the runs are measured for performance, the malloc
backing of the executables is simply turned on by setting the two applicable
environment variables, as shown in Listing 18.
Listing 18. Running SPECint2000 with malloc commands in 16MB huge pages
# export HUGETLB_MORECORE=yes
# export LD_PRELOAD=libhugetlbfs.so
# runspec --config LoP_config_file.cfg int
|
For the SPECcpu2000 Fortran publishes, you need to leverage the common approach
of re-linking the applicable programs to have the executable setup to back the
.bss, .data, and
.text sections with 16MB pages. In this case, modify
the link step of the components. In the example
(Listing
19), I show the changes for the
179.art component, as published on the SPEC.org Web
site (see Resources).
Listing 19. Example of a SPECfp2000 component compile/link flags
179.art=peak=default=default:
notes179_1 = 179.art
notes179_2 = +FDO -O5
notes179_3 = -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
PASS1_CFLAGS = -qpdf1 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
PASS1_LDCFLAGS = -qpdf1 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
PASS2_CFLAGS = -qpdf2 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
PASS2_LDCFLAGS = -qpdf2 -O5 -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT
|
Then invoke the run, as follows:
# export HUGETLB_MORECORE=yes
# runspec --config LoP_config_file.cfg fp
|
Publishing
SPECompM2001 with RHEL 5
Demonstrating that the approach is available and the same using RHEL 5, a recent
SPECompM2001 publish was completed on an IBM System p5 560Q system. The result is
available on the SPEC.org Web site (see Resources).
Selected components that showed performance gains with
malloc commands being backed by 16MB huge pages were
run with the environment variables set appropriately. See
Listing 20 for an example of how the invocation of one of
the components was modified. It looks easy, right?
Listing 20. Example of SPECompM2001 component using malloc in 16MB pages
316.applu_m: -qpdf1/pdf2
-O4 -q64
ENV_HUGETLB_MORECORE=yes
ENV_LD_PRELOAD=libhugetlbfs.so
|
An application
tuning example
Another exercise in tuning a memory application (stream) with libhugetlbfs is
available on the Web in the IBM Linux on POWER wiki (see
Resources).
The example in the wiki shows the practical and common steps a user can take to
maximize performance of memory-intensive workloads on a system running Linux.
In summary
In this article, you were introduced to an open source community project called
libhugetlbfs, which is now supported on SLES 10 and now in RHEL 5. The library
provides easier transparent access for customer applications to system huge pages.
You learned how to use libhugetlbfs libraries with GCC or the IBM XL C/C++
compilers for Linux. The libhugetlbfs library allows you to leverage the huge
pages for malloc commands in a program, the
.bss section, or the combination of
.bss, .data, and
.text sections of an executable, improving the
performance of the application in some cases. You can easily configure
libhugetlbfs on any system, and access controls are easily included to limit
access to the memory resources.
Acknowledgments
Particular thanks to Adam Litke (in Rochester, MN), Nishanth Aravamudan
(Beaverton, OR), Steve Fox (Rochester, MN), and David Gibson (Australia) for their
tireless work and focus on delivering libhugetlbfs to the community as a supported
library in 2006. Their work continues into 2007, as they make the library more
flexible and robust across system platforms.
Thanks to Chakarat Skawratananond (Austin, TX), a regular contributor to
developerWorks, on his help and assistance in working with libhugetlbfs on the
performance improvements for customers.
Downloads | Description | Name | Size | Download method |
|---|
| SourceForge libhugetlbfs library | libhugetlbfs-1.0.1.tar.gz | 64KB | HTTP |
|---|
| Trial version of XL C/C++ Advanced Edition V8.01 | vac.rpm | 95MB | HTTP |
|---|
Note - A functional version of the IBM C/C++ compilers for the POWER systems.
Resources Learn
-
Tuning
stream with libhugetlbfs:
See a working example on the IBM Linux on POWER Wiki site and the common steps an
end user can take to maximize performance on Linux on POWER systems.
-
SPEC.org SPECcpu2000 results:
In this section, there are a number of examples of publishes done with
libhugetlbfs. Click on one of the results sections and search for "SLES."
-
SUSE for POWER:
Visit the Novell site and get the latest information for SUSE for POWER.
-
Red Hat Enterprise Linux
documentation:
Visit the Red Hat site for more information on Red Hat Enterprise Linux.
-
IBM Systems: Want more?
The developerWorks IBM Systems zone hosts hundreds of informative articles and
introductory, intermediate, and advanced tutorials.
-
New to IBM Systems?:
Get a better understanding of what skills are necessary to undertake systems
development with a wide range of IBM hardware products and related technologies.
-
Safari bookstore:
Visit this e-reference library to find specific technical resources.
Get products and technologies
Discuss
About the author  | 
|  | Bill Buros works is a performance analyst in the IBM Linux Technology Center. Based in Austin, Texas, Bill has been focused on performance for compute-intensive applications running on the Linux operating system. He works with customers and software product vendors to improve performance of their products, the Linux operating system, the kernel, and the overall software stack. One of his recent goals has been to get transparent huge pages implemented in the operating system for customers of Linux. You can contact him at wburos@us.ibm.com. |
Rate this page
|  |