Tuning options to consider with gcc
To start a discussion or get a question answered, consider posting on one of these forums:
For more information on performance for Linux on Power, see our performance wiki page.
In this paper, we focus on the classic tuning hints and tips used with customers.
A pre-req step is generally the porting of applications from other platforms to Linux on Power. Here are some example "porting" papers and offerings available:
Once your application is ported and executing Linux on Power systems, we have found some simple things to consider when using the gcc compilers for optimized executables. The options below apply to gfortran as well.
On this page we cover these topics:
- Start with the latest Distro versions
- Be aware of default 32-bit or 64-bit modes
- -mcpu and -mtune
- Consider using the Advance Toolchain
- Power6 mode
- Power6x mode
- Consider libhugetlbfs for run-time improvements
- Consider alternative malloc routines
- Consider FDPR-pro
- Insights: gcc and Linux Kernal
- Other options: IBM XL C/C++ and IBM XL Fortran compilers
First, the newer distro versions have updated gcc versions which naturally provide optimizations for POWER6. On the RedHat side, this includes RHEL 5.1, RHEL 5.2, and now RHEL 5.3. On the SuSE side, there is SLES 10 sp2 and now the latest SLES 11.
Note that the older gcc-3.3.4 compiler seen on releases like RHEL 4.x will not have the POWER6 optimizations built-in.
For more general reading on the GCC Version 4, see Get to know GCC 4
SLES 11 introduces the gcc 4.3 level of the compiler - the latest generation of compiler optimizations.
When building your application, always build on the "earlier" distro, and bring your application "forward" to the newer distro versions and releases.
For example, if you want to leverage gcc 4.3 on SLES, you can install the Advance Toolchain 2.0-5 on SLES 10 sp2 to build, and this should execute just fine on SLES 11. We will provide examples here.
On both SLES 10 and RHEL 5, the default compilation mode with gcc is to create 32-bit executables.
If you are desiring 64-bit executables, you simply need to remember to compile with -m64. Linux on Power is a 64-bit operating system and can seamlessly execute both 32-bit and 64-bit applications. When compiling in a 64-bit mode, be aware that you may need to install the various 64-bit libraries and packages.
On SLES 11, the new default mode with gcc is to create 64-bit executables.
In general for performance considerations, there are simple pros and cons to running with 64-bit executables. If your application needs the bigger address space provided with much bigger pointer values, then running compiled as a 64-bit executable is a given. When 64-bit is not needed, the 32-bit binary is usually "just fine".
-m "cpu" and "tune" options on the gcc commands allows the gcc compiler to target specific levels of Power hardware. This can be handy if you know your application will be run on a specific generation of hardware.
- -mcpu= selects the system target GCC will use
- -mtune= selects the processor implementation (micro-architecture) that GCC will tune for
For example; set -mcpu to the lowest ISA level the application/library needs to support, set -mtune to newest processor you need to support. If you need to build an application or library to run on both P5 and P6 use -mcpu=power5 -mtune=power6
If you are building a generic version of your application, the gcc defaults have been defined to make this easy.
<here we should add a gcc defaults table>
The Advance Toolchain provides an easy way to get the latest gcc and cpu-tuned libraries for SLES and RHEL releases.
See How to use Advance Toolchain for some examples on this.
When building for Power6 systems, you can compile and tune for Power6 simply by specifying -mcpu=power6 -mtune=power6
This level is recommended if your application is going to run exclusively on Power6 systems.
For very special cases...
If you're creating an application specifically and solely tuned for POWER6, you can consider taking advantage of an extended mode of the POWER6 system by compiling with -mcpu=power6x -O3 and booting the system in extended mode.
In particular, this mode will help floating-point extensive applications that are frequently converting floating point to/from integer or heavily using libm functions.
The more normal usage is gcc -mcpu=power6 -O3 which compiles the executable according to the Power architecture. Applications compiled for the extended mode will not be supported on future versions of the POWER hardware, but in most cases, these applications would be expected to be re-compiled on future systems anyway.
If you have a large application library (a lib*.so library vs a program) and these libraries have "many" GLOBAL functions (which implies intra-library calls through the Program Linkage Table) then you can consider linking with "-Wl,-Bsymbolic" from gcc.
When optimizing the run-time characteristics of your executables, leveraging larger page sizes can improve performance in some cases.
The RHEL Version 5 and now SLES 11 distro levels are "built" with a 64KB base page size, which in most cases provides the majority of run-time performance improvements. When using RHEL 5 or SLES 11, if you still need another possible 3% to 5% performance gains, leveraging the 16MB larger pages on Power systems "may" help.
If you're running on RHEL Version 4 or SLES 9 or SLES 10, the 16MB pages in some cases can provide performance gains closer to 10% to 15%. Your mileage will of course vary. Performance gains depend significantly on how memory is used by your application.
As an example, MicroQuil provides a replacement run-time malloc library which can provide performance improvements for some applications.
The classic next step in optimizing your binaries is to leverage the FDPR-pro tool from IBM. See the Alphaworks home page for FDPR-pro for more details.
This product is technically known as Post-Link Optimization for Linux on POWER. It is a post-link optimization utility for the POWER architecture that optimizes an executable program or a shared library, based on its run-time profile. In many cases, performance gains of 5% to 10% have been seen.
Interesting article on LWN.net which points to a DeveloperWorks article.
While gcc, g++, and gfortran continue to improve with every new release of the distro releases, in some cases the IBM compilers built specifically for the Power systems may provide better performance for your application.
XL C/C++ Version 10.1 and XL Fortran Version 12.1