Guide to porting Linux on x86 applications to Linux on POWER


Usually, porting Linux applications from the x86 platform to Linux on POWER is simple because both platforms are based on Linux. In fact, porting often requires only a recompile with minor changes to some compiler and linker switches.

However, when an application depends on a specific hardware architecture, it typically requires major modifications. This guide highlights the differences between Linux on x86 and Linux on POWER and provides recommendations for making your x86 code ready for the port to Linux on POWER.


When you port an application to a new platform, proper planning is essential. To adequately prepare for the port you should:

  1. Sign up for the IBM® Chiphopper™ program.
  2. Understand IBM eServer® POWER platform differences: POWER4™ compared to POWER5™.
  3. Decide which Linux for POWER distribution to use: Red Hat Enterprise Linux or SUSE LINUX.
  4. Migrate to the GNU Make build system.
  5. Understand the differences between the x86 and POWER architectures.
  6. Determine which compiler to use: GNU Compiler Collection (GCC) or IBM XL C/C++ Advanced Edition for Linux.

These tasks are written with the assumption that your company has already made the decision to port its Linux application running on x86 to Linux on POWER. If this isn't the case and you'd like more information about Linux on POWER features and functions, you should review "Linux on POWER: An overview for developers" (developerWorks, March 2005) before continuing.

Sign-up for the Chiphopper program

If your Linux on x86 application is commercially available and coded in C/C++, Java™, or both, then it may be an ideal candidate for the IBM eServer Application Advantage™ for Linux (Chiphopper) program. Chiphopper offers no-cost access to tools and support that lets you easily port, test, and support your existing Linux on x86 application across all IBM eServer and middleware platforms. This offering can help you maximize your Linux market opportunity while minimizing your expense. For more information about Chiphopper and to determine whether your application qualifies, see the Related topics section.

If your application does not meet the Chiphopper program qualifications or you want to do the port on your own, continue to the next section: Understand POWER platform differences.

Understand POWER platform differences

The POWER platform to which you are porting determines which optimization options you will use to compile your application. This is especially important when you use the XL C/C++ compiler, which will be discussed in detail later in this paper.

In 2004, IBM introduced the POWER5 processor, which is the latest generation of the POWER family of processors with the same base architecture. In addition to the benefits of previous versions of POWER chips, the POWER5 processor has built-in virtualization capabilities, including IBM Micro-Partitioning™, Dynamic Logical Partitioning (DLPAR), virtual storage, virtual Ethernet, and Capacity on Demand (CoD). For more information about virtualization on POWER5, review "Linux on POWER: An overview for developers" (developerWorks, March 2005).

If you'll be porting to both POWER4 and POWER5 processor-based servers, use different flags on -qarch and -qtune with the XL C/C++ compiler to optimize your application. Specific flags and when to use them are described later in this document.

Decide which Linux for POWER distribution to use

SUSE LINUX Enterprise Server (SLES) for POWER is the product of more than four years of collaboration between SUSE, IBM, and the open source community. In addition to providing a 64-bit kernel for the POWER architecture, SLES9 now includes a 64-bit run time environment and GCC toolchain for building 64-bit versions of popular open-source software (such as the Apache Web server or MySQL). Combined with IBM XL compilers for performance-critical applications, the 64-bit Linux on POWER environment offered by SLES9 on IBM eServer OpenPower™, IBM eServer p5, IBM eServer i5, and IBM eServer BladeCenter™ JS20 hardware delivers an unparalleled platform for developing and deploying Linux solutions. For more information on SUSE LINUX, see the Related topics section.

Red Hat Linux is widely recognized as an industry leader in both desktop and enterprise-class Linux markets. With its latest release of Red Hat Enterprise Linux Advanced Server (RHEL4), Red Hat provides significant technology enhancements over the V3 release. Areas of specific development include improvements in security capabilities, increased server performance and scalability, and enhanced desktop capabilities -- all while ensuring a high level of compatibility with prior releases. RHEL4 is the world's leading enterprise-focused Linux environment. For more information on RHEL4, see the Related topics section.

The decision to use SLES or RHEL does not directly impact the porting process. However, be aware that SUSE and Red Hat have different release and update cycles and different policies on binary compatibility, which may affect your application update decision in the long run.

In addition to these two supported Linux distributions, several other distributions will run on the Power Architecture, including Yellow Dog, Debian Linux, and Gentoo (see Related topics).

Migrate to GNU Make

If you are not currently using GNU Make to build your applications, consider migrating to it. It’s good programming practice to use a tool that controls the generation of executables instead of depending on scripts or direct invocation of the compiler to generate the executable. Most C and C++ programmers use Make as that tool. Switching to GNU Make lets you be consistent in your build operations across multiple platforms with the same build control, the makefile. GNU Make is distributed with both SLES9 and RHEL4. For more information about GNU Make, see the Related topics section.

Understand the differences between the x86 and POWER architectures

There are several architecture-specific differences you should be aware of before porting your x86 Linux applications to POWER. The following architectural differences -- described in detail in the next couple of sections -- are particularly noteworthy:

  • Endianness or byte ordering
  • Data type length in 32- and 64-bit environments
  • Data alignment differences in the architectures

Endianness (byte ordering)

POWER processors use a big-endian architecture while x86 processors use a little-endian architecture. This section covers endianness (also known as byte ordering) issues and describes techniques for handling them. Byte-ordering issues are often encountered by developers during the process of migrating applications, device drivers, or data files from the x86 architecture to the POWER architecture.

Big endian and little endian
Endianness refers to how a data element and its individual bytes are stored and addressed in memory. In a multi-digit number, the digit with a higher order of magnitude is considered more significant. For example, in the four-digit number 8472, the 4 is more significant than the 7. Similarly, in multi-byte numerical data, the larger the value the byte is holding, the more significant it is. For example, the hexadecimal value 0x89ABCDEF, can be divided into four bytes: 0x89, 0xAB, 0xCD, and 0xEF with arithmetic values of 0x89000000, 0xAB0000, 0xCD00, and 0xEF. Obviously, byte 0x89 is the largest value; therefore, it is the most significant byte, while byte 0xEF is the smallest part and thus the least significant byte.

When a number is placed in memory, starting from the lowest address, there are only two sensible options:

  1. Place the least significant byte (little endian) first.
  2. Place the most significant byte (big endian) first.

The following diagram shows how big-endian and little-endian processors place a 32-bit hexadecimal value, like 0x89ABCDEF, in its memory:

Figure 1. Big-endian and little-endian processors storing hexadecimal values
Big-endian and little-endian processors storing hexadecimal values

0x89 is the most significant byte, and 0xEF is the least significant byte. On big-endian systems, the most significant byte is placed at the lowest memory address. On little-endian systems, the least significant byte is placed at the lowest memory address.

Byte ordering is not an issue if a program writes a word to memory and then reads the same location as a word because it sees the same value when a variable is referenced consistently. If a program tries to read the same value one byte at a time (when a word is written), it may get different results depending on whether the processor is big endian or little endian.

POWER processor families are examples of platforms that use the big-endian data layout, while the x86 processor families are examples of systems that use the little-endian data layout. Identifying endian-dependent code segments and transforming them into the big-endian equivalent is important during the migration of x86 applications to the POWER platform.

Dealing with endianess
This section describes how to identify the endian-dependent areas in your code and methods to covert them to the correct endian format.

Endian-dependent code
Non-uniformity in data referencing is the strength of the C language, making it popular for programming system-level software, including operating systems and device drivers. This strength includes type casting, pointer manipulation, unions, bit fields, structures, and flexible type checking. However, these same features are also sources of endianness portability issues. Take the following two pieces of code as examples:

Listing 1. Non-uniform data reference using pointer
 #include <stdio.h> int main(void) { int val; unsigned char *ptr; ptr = (char*) &val; /* pointer ‘ptr’ points to ‘val’ */ val = 0x89ABCDEF; /* four bytes constant */ printf("%X.%X.%X.%X\n", ptr[0], ptr[1], ptr[2], ptr[3]); exit(0); }
Listing 2. Non-uniform data reference using union
#include <stdio.h> union { int val; unsigned char c[sizeof(int)]; }u; int main(void) { u.val = 0x89ABCDEF; /* four bytes constant */ printf("%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]); exit(0); }

On an x86 system, the result is:


On a POWER system, the result is:


The endianness problem surfaces as val and is read, byte by byte, starting with the most significant byte.

Though POWER processor-based systems support the big-endian data storage model, there is an exception: its I/O busses. Both IBM Micro Channel® and the latest PCI, are little-endian based. In POWER systems, the I/O controller, which is the bridge between the system bus and its I/O buses, provides a data steering function to convert data from little endian to big endian (and vice versa) in reading from or writing to a device. This data steering function is applied to both Direct Memory Access (DMA) and Program I/O (PIO) or Memory-Mapped I/O (MMIO) data. In essence, the I/O controller treats data as byte streams such that byte 0 in the system goes to byte 0 in I/O, byte 1 to byte 1, and so forth. This brings up an interesting scenario: Bytes in multiple-byte data need to be swapped before they are passed to the I/O. As a result, when I/O-independent code is ported, the little-endian code should remain unchanged because the I/O devices are little-endian based. However, it’s recommended that all endian-dependent code be identified (by inspection or using programming aids such as lint) and changed manually.

In addition to I/O-related programs, applications for TCP/IP protocol handling may also have endian-dependent code. Because the TCP/IP protocol specifies its data format in big endian, an x86-based program may covert TCP/IP data to its native endianness, little endian, before performing any math operations on it. In fact, there are a set of conversion routines defined in Portable Operating System Interface (POSIX) to perform such operations. The routines include htonl(), ntohl(), htons(), and ntohs(). The s in the routine name represents short, and the l represents a 32-bit quantity. Although the program that calls these functions shouldn't have endianness issues, there may be some endian-dependent code that explicitly manipulates TCP/IP data by itself.

Write endian-neutral code

A program module is considered endian neutral if it retains its functionality while being ported across platforms of different endianness. In other words, there is no relation between its functionality and the endianness of the platform it is running on. Here are a few recommendations for writing endian-neutral code:

  • Use macros and directives
    To make the code portable, you can use macros and conditional compile directives as shown in Listing 3 and Listing 4.
Listing 3. Use directives to neutralize endianness effect
#include <stdio.h> #define BIG_ENDIAN 0 #define LITTLE_ENDIAN 1 #define BYTE_ORDER BIG_ENDIAN union { int val; unsigned char c[sizeof(int)]; }u; int main(void) { u.val = 0x89ABCDEF; #if (BYTE_ORDER == BIG_ENDIAN) printf("%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]); #else /*! BYTE_ORDER == BIG_ENDIAN*/ printf("%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]); #endif /*BYTE_ORDER == BIG_ENDIAN*/ exit(0); }
Listing 4. Use macro to access bits 16 to 23 of four-byte integer
#define INTB16TO23(a) ((a>>16) & 0xff); int main(void) { int a=0x11121314; int b; b = INTB16TO23(a); // b is 0x12 in LE and BE }
  • Use compile time option
    A better way to implement this is to define the value of BYTE_ORDER on the compiler command line -DBYTE_ORDER=BIG_ENDIAN. This removes the need to edit every file in a device driver or application when compiling on a new platform with a different byte order. Instead, you may have to edit only the makefiles used to build the driver or application.
  • Test memory layout
    The example in Listing 5 tests the first byte of the multi-byte integer, endian, to determine if it is 0 or 1. If it is 1, the running platform is assumed to be little endian. The drawback to this approach is that the variable must be tested each time a data access of this type is performed, thus adding additional instructions to the code path, which adds a performance penalty. The intended platform for an application or a device driver, along with the endianness of that platform, is known at compile time. Given that both device drivers and applications have performance considerations, using a compile time definition is the best method of selecting the appropriate endian-dependent code segment.
Listing 5. Determine the endianness at run time
const int endian = 1; #define is_bigendian() ( (*(char*)&endian) == 0 ) union { int val; unsigned char c[sizeof(int)]; }u; int main(void) { u.val = 0x89ABCDEF; if (is_bigendian()) { printf("%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]); } else { printf("%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]); } exit(0); }

Data type length in 32- and 64-bit environments

Both GCC and XL C/C++ compilers on the Linux operating system offer two different programming models: ILP32 and LP64. ILP32, which stands for integer/long/pointer 32, is the native 32-bit programming environment on Linux. The ILP32 data model provides a 32-bit address space, with a theoretical memory limit of 4 GB. LP64, which stands for long/pointer 64, is the 64-bit programming environment on Linux.

Table 1 shows the width in bits of base data types in ILP32 and LP64 models on POWER and x86.

Table 1. Base data types of ILP32 and LP64 on POWER and x86 (in bits)

long long64646464
long double64/128*64/128*96128

*The default size for long double in Linux on POWER is 64 bits. They can be increased to 128 bits if you use the compiler option -qlongdouble with the XL C/C++ compiler.

All definitions for numeric values can be found in /usr/include/limits.h on both POWER and x86 platforms.

Most Linux x86 applications now run in 32 bits. However, with the appearance of x86 64-bit extension (x86-64), more and more x86 applications will be written natively in 64 bit. In the exercise of porting x86 applications to POWER, you should use the Linux POWER environment that matches your source environment. In other words, avoid moving from a 32-bit to a 64-bit programming model as part of the process of migrating to POWER because the exercise will no longer be a true port, but rather a development activity. If you want to port a 32-bit x86 application to the 64-bit POWER programming model, treat the migration as two steps:

  1. Port to Linux on POWER 32-bit environment (including testing and verifying it).
  2. Migrate to 64 bit.

That being said, an application should be ported to 64 bit if it can:

  • Benefit from more than 4 GB of virtual address space.
  • Benefit from more physical memory (greater than 4 GB), and if its users are likely to deploy it on a system with more than 4 GB of physical memory.
  • Benefit from 64-bit size long integers.
  • Benefit from full 64-bit registers to do efficient 64-bit arithmetic.
  • Use files larger than 2 GB.

Some examples of applications that could benefit from being migrated to 64 bit include:

  • Database applications, especially those that perform data mining
  • Web caches and Web search engines
  • Components of CAD/CAE simulation and modeling tools
  • Scientific and technical computing applications, such as computational fluid dynamics, genetic simulation

An application can remain 32-bit and still run on the 64-bit Linux on POWER kernel without requiring any code changes. IBM Linux on POWER processor-based servers support both 32-bit and 64-bit applications running simultaneously on the 64-bit architecture without any performance degradation between modes, because 64-bit POWER architecture includes complete native 32-bit support.

When porting applications between different platforms (from x86 to POWER) or programming models (from ILP32 to LP64), you need to take into account the differences between data width and alignment settings available in the different environments to avoid possible performance degradation and data corruption.

When porting from x86 ILP32 to POWER ILP32 or x86 LP64 to POWER LP64, notice that, in Table 1, the width of all the basic data types remains the same except the long double – 96 bits to 64/128 bits for IPL32 bit and 128 bits to 64/128 bits for LP64. This means that you should probably examine the portion of the code related to long double data types. If you plan to use the XL C/C++ compiler, use the -qlongdouble compilation flag to get maximum compatibility on the long double data type.

Data alignment

When porting applications between platforms or between 32-bit and 64-bit models, consider the differences between alignment settings available in the different environments to avoid possible performance degradation and data corruption. The best practice is to enforce natural alignment of the data items. Natural alignment means storing data items at an address that is a multiple of their sizes (for instance, 8-byte items go in an address multiple of eight). For the XL C/C++ compiler, each data type within aggregates (C/C++ structures/unions and C++ classes) will be aligned along byte boundaries according to either the linuxppc or bit-packed rules, where linuxppc is the default and is the natural alignment. linuxppc is also compatible with the default GCC alignment rules. Table 2 shows alignment values on POWER and x86 along with their data type widths in bytes.

Table 2. Alignment values on POWER and x86 (in bytes)

long long88888888
long double8/1688/1681241616

The keyword __alignof__ in both GCC and XL C/C++ allows you to inquire about how an object is aligned. Its syntax is just like sizeof. For example, if the target machine requires a double value to be aligned on an 8-byte boundary, then __alignof__ (double) is eight.

As shown in Table 2, a long double variable is aligned to four bytes on x86, but aligned to eight bytes on POWER. Structures will, therefore, have a different layout on different platforms. It's important not to hard code any sizes and offsets. Instead, use the C operator sizeof to inquire about the sizes of both fundamental and complex types. The macro offsetof is available to get the offsets of structure members from the beginning of the structure.

Determine which compiler to use: GCC or IBM XL C/C++

There are two C/C++ compilers available for Linux on POWER: GCC and IBM XL C/C++ compiler. GCC offers robust portability of code intended for compilation in Linux, while the IBM XL compilers offer a substantial performance increase over GCC when higher levels of optimization are used. Both compilers offer 32- and 64-bit compilation modes, and the Linux on POWER environment allows both 32- and 64-bit code to be run simultaneously without a performance loss.

Port with the GCC compiler set

For projects developed across multiple platforms where a GCC compiler is the original compiler, the GCC compiler is often used to deploy applications for Linux on POWER. This is especially true for applications where performance is not critical, such as small utilities. GCC also allows some code prototypes that are understood by only GCC, such as GCC-specific macros. However, many of these GCC-specific features are incorporated in the XL C/C++ compilers.

In general, porting code with GCC compilers should be straightforward. In most cases it's a simple recompile and is as easy as typing the make command. Architectures may vary, and occasionally library version discrepancies may arise. But for the most part it doesn't matter which architecture it runs on. Architecture-specific flags, such as -m486 and -mpowerpc64, are discouraged for compilation on Linux on POWER because GCC does not have extensive processor maps for optimization routines on these architectures. Also, without architecture-specific flags, binary compatibility will be greater across different models of POWER hardware.

64-bit GNU compilers are included in SLES9 and RHEL4, and it's recommended that you use the compiler sets packaged with these Linux distributions.

On all architectures, libraries must be compiled with -fPIC; x86 has this flag applied by default with GCC. On POWER, this flag specifies that the generated code will be used in a shared object. Review the GCC compiler manuals for more information (see Related topics).

Port with the IBM XL C/C++ compiler set

IBM XL C/C++ V7.1 is the follow-on release to IBM VisualAge® V6.0 for Linux. XL C/C++ compilers offer a high-performance alternative to GCC as well as a number of additional features.

Fortunately, XL C/C++ uses the GNU C and C++ headers, and the resulting application is linked with the C and C++ run time libraries provided with GCC. This means that the XL C/C++ compilers produce GNU elf objects, which are fully compatible with the objects GCC compilers produce. XL C/C++ ships the SMP run time library to support the automatic parallelization and OpenMP features of the XL C/C++ compilers.

Moving from GCC to XL C/C++ for Linux on POWER is straightforward. XL C/C++ assists with the task by providing an option, -qinfo=por, to help you filter the emitted diagnostic messages to show only those that pertain to portability issues. In addition, a subset of the GNU extensions to gcc and gcc-c++ are supported by XL C/C++. See "XL C/C++ for Linux on pSeries Compiler Reference" for a complete list of features that are supported, as well as those that are accepted but have semantics that are ignored.

To use supported features with your C code, specify either -qlanglvl=extended or -qlanglvl=extc89. In C++, all supported GNU gcc/gcc-c++ features are accepted by default. Furthermore, gxlc and gxlc++ help to minimize the changes to makefiles for existing applications built with the GNU compilers

XL C/C++ documentation
The following PDF documents are provided when you install XL C/C++:

  • "XL C/C++ for Linux Getting Started" (getstart.pdf).
  • "XL C/C++ for Linux Installation Guide" (install.pdf), which contains instructions for installing the compiler and enabling the main pages.
  • "XL C/C++ for Linux C/C++ Language Reference" (language.pdf), which contains information about the C and C++ language as supported by IBM.
  • "XL C/C++ for Linux Compiler Reference" (compiler.pdf), which contains information about the various compiler options, pragmas, macros, and built-in functions, including those used for parallel processing.
  • "XL C/C++ for Linux Programming Guide" (proguide.pdf), which contains information about programming using XL C/C++ not covered in other publications.

Each of these documents can be found at the following locations:

  • /docs/LANG/pdf directory of the installation CD, where LANG represents the language and location code
  • /opt/ibmcmp/vacpp/7.0/doc/LANG/pdf directory after the compiler is installed

An HTML version of the product documentation is installed in the /opt/ibmcmp/vacpp/7.0/doc/LANG/html directory. From this directory, open the index.html file to view the HTML files.

Optimization options in XL C/C++

XL C/C++ provides a portfolio of optimization options tailored to IBM hardware. For Linux on POWER, many applications compiled with XL C/C++ have shown significant performance improvements over those compiled with GCC. Note that not all optimizations are beneficial for all applications. There's usually a trade-off between the degree of optimization done by the compiler and an increase in compile time accompanied by reduced debugging capability.

Optimization levels
Optimization levels are specified by compiler options. The following table summarizes the compiler behavior at each optimization level:

Table 3. Compiler behavior at each optimization level

Options Behavior
-qnoopt Provides fast compilation and full debugging support.
-O2 (same as –O)Performs optimizations that the compiler developers considered the best combination for compilation speed and run time performance. This setting implies -qstrict and –qstrict_induction, unless explicitly negated by -qnostrict_induction or -qnostrict.
-O3 Performs additional optimizations that are memory intensive, compile-time intensive, or both. They are recommended when the run time improvement outweighs the concern for minimizing compilation resources.
-O4 and -O5Performs interprocedural optimization, loop optimization, and automatic machine tuning.

Target machine options are options that instruct the compiler to generate code for optimal execution on a given microprocessor or architecture family. By selecting appropriate target machine options, you can optimize to suit the broadest possible selection of target processors, a range of processors within a given family of processor architectures, or a specific processor. The following options control optimizations affecting individual aspects of the target machine:

Table 4. Optimizations affecting individual aspects of target machine

Options Behavior
-qarchSelects a family of processor architectures for which instruction code should be generated. The default is -qarch=ppc64grsq. The following sub-options are also available: auto, pwr3, pwr4, pwr5, ppc970, ppc64, ppcgr, rs64b, and rs64c.
-qtuneBiases optimization toward execution on a given microprocessor without implying anything about the instruction set architecture to use as a target. The default on Linux is -qtune=pwr3. Available sub-options include: auto, pwr3, pwr4, pwr5, ppc970, rs64b, and rs64c.
-qcacheDefines a specific cache or memory geometry. If -qcache is used, use -qhot or -qsmp along with it.
-qhotHigh-order transformations are optimizations that specifically improve the performance of loops through techniques such as interchange, fusion, and unrolling. The option -qhot=vector is the default when -qhot is specified. Try using -qhot along with -O2 and -O3. It is designed to have a neutral effect when no opportunities for transformation exist.
-qsmpGenerates threaded code needed for shared-memory parallel processing. The option -qsmp=auto is the default when -qsmp is specified. Use -qsmp=omp:noauto if you are compiling an OpenMP program and do not want automatic parallelization. Always use the _r compiler invocations when using -qsmp.

To get the most out of target machine options, you should:

  • With –qarch, specify the smallest family of machines possible on which you expect your code to run well.
  • With –qtune, specify the machine on which the performance should be best. For example, if your application will only be supported on POWER5 systems, use -O3 -qarch=pwr5 -qtune=pwr5. Modification of cache geometry may be useful in cases where the systems have configurable L2 or L3 cache options or where the execution mode reduces the effective size of a shared level of cache (for example, two-core-per-chip SMP execution on POWER5). If your application will be running on POWER4 only or combination of POWER4 and POWER5, use -qarch=pwr4 -qtune=pwr4.

POWER platforms support machine instructions that are not available on other platforms. XL C/C++ provides a set of built-in functions that directly map to certain POWER instructions. Using these functions eliminates function call-return costs, parameter passing, stack adjustment, and all additional costs related to function invocations. For the complete list of the supported built-in functions, please see the "XL C/C++ C++ for Linux on pSeries Compiler Reference" installation document.

However, software originally intended to be compiled with GCC compilers may need more attention when recompiled with IBM XL C/C++ compilers. You need to edit makefiles to reflect the correct path to the XL C/C++ compilers, which is by default /opt/ibmcmp/. You also need to set the correct optimization flags for the specific architecture (such as -03 –qarch=pwr5 –qtune=pwr5 for IBM POWER5 hardware).

In addition to optimization modes for various POWER architecture derivatives, the XL C/C++ compilers can be instructed to compile software in common mode. This guarantees compatibility across all POWER architectures at the expense of performance otherwise gained through architecture-specific optimization. When compiling 64-bit code, the compiler must be given the flag for a 64-bit build (that is -q64), because it defaults to 32-bit mode.

A few tips for compiling GCC-oriented code with the XL C/C++ compiler set are listed below:

  • C++ comments: GCC allows C++ style commenting to be used in C files by default, but this is not the case with XLC, the XL C compiler. Because it's not economical to change all of the comments in a group of source files to comply with C style commenting, XLC provides a -q argument to allow these comments: -q cpluscmt. When C code is compiled with this flag, both C and C++ style commenting is interpreted.
  • Environment variables are often the easiest way to configure build scripts. As opposed to hand editing configure scripts and makefiles, you should set relevant environment variables, such as $CC and $CFLAGS and that you allow the configure script to generate the makefile without the need for hand editing.
  • Configure scripts also need to be aware of the proper platform type. This will be either linux-powerpc-unknown-gnu or linux-powerpc64-unknown-gnu. You should set this for compilation with either GCC or XL C/C++ by appending the -target= flag to the configure script.
  • Although extensive documentation for the IBM XL C/C++ compilers exists, an argument list is provided at the console by running the compiler with no arguments, for example, $COMPILER_PATH/bin/cc.

Comparison of GCC and XL C/C++ compiler options

The following table compares the commonly used compiler options from GCC and XL C/C++:

Table 5. Commonly used compiler options from GCC and XL C/C++

GCC XL C/C++ C/C++ Description
-v -v, -V, -# Turn on verbose mode.
-p/-profile-pSet up the object files produced by the compiler for profiling.
n/a -q32, -q64, or set OBJECT_MODE environment variableCreate 32- or 64-bit objects. GCC 64-bit compilers are located at /opt/cross/bin.
-fsyntax-only-qsyntaxonly Perform syntax checking without generating object files.
-fpic-qpic=smallGenerate position-independent code for use in shared libraries. In XL C/C++, the size of the Global Offset Table is no larger than 64 Kb. If –qpic is specified without any suboptions, -qpic=small is assumed. The -qpic option is enabled if the -qmkshrobj compiler option is specified.
-fPIC-qpic=large Allow the Global Offset Table to be larger than 64 KB.
-pthread-qthreaded or _r invocation mode Create programs running in a multi-threaded environment.
-fno-rtti-qnortti Disable generation of run time type –qrtti identification (RTTI) for exception handling and for use by typeid and dynamic_cast operators. On XL C/C++, the default is -qnortti.
-static-qstaticlink Prevent object generated with this option from linking with shared libraries.
-static-libgcc-qstaticlink=libgcc Instruct compiler to link with the static version of libgcc.
-shared-qmkshrobj Instruct compiler to produce a shared object.
-shared-libgcc-qnostaticlink=libgcc Instruct compiler to link with the shared version of libgcc.
-Wl,-rpath-Wl,-rpath or –R Pass a colon-separated list of directories used to specify directories searched by run time linker.
-fno-implicit-templates, -frepo -qtempinc, -qtemplateregistry, -qtemplaterecompile Instantiate template.
-w-w Suppress warning messages.
 -warn64 Enable checking for long-to-integer truncation.
 -qinfo=<…>Produce informational messages.
-fpack-struct-qalign=bit_packed Use bit_packed alignment rules.
 -qalign=linuxppc Use default GCC alignment rules to maintain compatibility with GCC objects. This is the default.
-O,-O2,-O3-O,-O2,-O3,-O4,-O5 Use to set optimization levels.
 -qarch, -qtune, -qcacheUse to set optimization options for a particular processor.

See "How to use IBM XL C/C++ Advanced Edition V7.0 for Linux on POWER: A guide for GCC users" (developerWorks, December 2004) for more information about IBM C/C++ compiler.


After you complete each of the planning steps, you should be ready to perform the port. This section outlines the recommended steps to successfully port your application to Linux on POWER.

  1. Migrate your build system to GNU Make (if necessary)
    This is the process to build one or several makefiles. You can also take advantage of GNU’s program build automation utilities, such as Autoconf, Automake, and Buildtool to maximize your program’s portability across different UNIX® platforms. GNU’s program build automation utilities are at
  2. Modify architecture-dependent code (if necessary)
    Consider byte ordering (endianness), data length in 32-bit and 64-bit models, and data alignment on the different platforms, which are mentioned in the Understand the differences between the x86 and POWER architectures section.
  3. Build
    After the makefiles are built and all the programs are modified, the build process is as simple as issuing a command, such as make. If you encounter errors during the building process, they will usually be compiler and linker errors and program syntactic errors. Modifying compiler options through the makefiles or modifying the offending syntax in the code usually fixes the errors. The compiler reference menu and programming guide is your best reference during this phase. For IBM XL C/C++, refer to "XL C/C++ for Linux Compiler Reference" (compiler.pdf) and "XL C/C++ for Linux Programming Guide" (proguide.pdf). For GCC, both the compile reference and programming guide are at
  4. Test and troubleshoot
    After the program is successfully built, test it for run time errors. Run time errors are usually related to your program logic in this phase. It's always a good idea to write several test programs to verify that the output of your application is the one you expected.
  5. Tune performance
    Now that the ported code is running on the POWER platform, monitor it to ensure that it performs as expected. If it doesn't, you'll need to complete performance tuning. You can use the following suite of tools to identify performance problems in your application and show how your application interacts with the Linux kernel:

    1. OProfile
      OProfile profiles code based on hardware-related events, such as cache misses or CPU cycles. For example, OProfile can help you determine which of the source routines causes the most cache misses. OProfile utilizes hardware performance counters provided in many CPUs, including IBM POWER4, POWER5 and PowerPC™ 970. For more information about OProfile for Linux on POWER, read "Identify performance bottlenecks with OProfile for Linux on POWER" (developerWorks, May 2005), or visit the OProfile Web site (see Related topics).
    2. Post-Link optimization
      The Post-Link optimization tool optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload. It then re-analyzes the program (together with the collected profile), applies global optimizations (including program restructuring), and creates a new version of the program optimized for that workload. The new program generated by the optimizer typically runs faster and uses less real memory than the original program. For more information, visit the Post-Link optimization Web site (see Related topics).
    3. TProf
      TProf is a timer profiler that identifies what code is running on the CPU during a user-specified time interval. It's used to report hot spots in applications as well as the kernel. TProf records which code is running at each system-clock interrupt (100 times per second per CPU).
    4. PTT
      PTT collects per-thread statistics, such as number of CPU cycles, number of interrupts, and number of times the thread was dispatched.
    5. AI
      AI displays CPU utilization statistics during a user-specified interval.
  6. Package
    If your ported application will be a commercial product or you want to distribute the application to third parties to install, you need to package your ported application, including libraries, documentation, and sometimes source code. Linux provides several ways to package your application, such as a tarball, self-installing shell script, and RPM. RPM is the most popular packaging tool for Linux. For more information about RPM, see the Related topics section.
Figure 2. Flow of the porting activities described in steps one through six above
Figure 2. Flow of the porting activities described in steps one through six above
Figure 2. Flow of the porting activities described in steps one through six above


Linux on POWER offers an enterprise-class Linux environment, complete with both 32-bit and 64-bit application environment and toolchains. Linux on POWER offers twin compiler sets that provide ease of migration with open source code and high-performance exploitation of the award-winning POWER architecture. Porting your Linux on x86 application to Linux on POWER allows you to take advantage of the application performance available on the POWER architecture, now augmented by development tools never before offered on the Linux operating system. In summary, Linux on POWER is the leading platform for the deployment of high-performance Linux applications.


I would like to acknowledge Linda Kinnunen for her document template and helpful reviews and the IBM Linux on Power team for their technical assistance and reviews of this document.

Downloadable resources

Related topics

  • The AIX 5L porting guide provides details on the types of problems most likely to be encountered when porting applications from other UNIX-based platforms to the IBM AIX® 5L Operating System.
  • The article "Guide to porting from Solaris to Linux on POWER" (developerWorks, February 2005) highlights differences and recommends possible solutions when porting from Solaris to Linux on POWER.
  • To help you determine which Linux for POWER distribution to use, visit the SUSE LINUX and Red Hat Web sites.
  • In addition to the two supported Linux distributions, SLES and RHEL, several other distributions will run on the Power Architecture, including Yellow Dog, Debian Linux, and Gentoo.
  • Consider migrating to GNU Make to build your applications.
  • RPM is the most popular packaging tool for Linux.
  • For more information about OProfile for Linux on POWER, visit the OProfile Web site.
  • To learn more about the Post-Link optimization tool, visit the Post-Link optimization Web site.
  • For more GCC compiler information, review the GCC compiler manuals.


Sign in or register to add and subscribe to comments.

ArticleTitle=Guide to porting Linux on x86 applications to Linux on POWER