Guide to port Linux on x86 applications to Linux on Power

This article describes how to port your Linux® C/C++ applications from the x86 platform (Intel® or AMD) to IBM® PowerLinux™ using the following straightforward, step-by-step process. First, learn what it takes to prepare for the port and then follow the implementation tips to get your 32-bit or 64-bit x86 code running on PowerLinux.

Share:

Artis Walker (walkerar@us.ibm.com), AIX/Linux Porting Engineer, IBM

Photo of ArtisArtis Walker is a porting engineer in IBM. He is based in Austin, TX. His primary role at IBM is to help ISVs to port C/C++ code to the IBM Power platform. Artis has been involved in software development and system integration on both Linux and AIX platforms for more than 20 years. He is the co-author of UNIX to Linux Porting: A Comprehensive Reference, ISBN:0131871099.



Steven Frischknecht (sfrisch@us.ibm.com), Technical Enablement Specialist, IBM

Photo of StevenSteven Frischknecht is the Program Manager of the IBM Chiphopper offering. He is based out of the greater New York area. Steven has been involved with assisting ISVs world wide in porting their existing Linux x86 applications to IBM System z and IBM Power platforms for over six years through IBM Chiphopper.



11 June 2014

Also available in Chinese Japanese

Introduction

In most cases, porting Linux applications from the x86 platform to Linux on Power is simple because both platforms are based on a common Linux version from Novell SUSE or Red Hat. Porting often requires only a GNU Compiler Collection (GCC) recompile with minor changes to some compiler and linker switches.

However, when it comes to porting, being prepared to tackle unknown issues that may crop up is an important advantage to having a successful port. Usually, if you focus on porting first, and optimizing later, the guidelines, techniques, and tools we introduce you to in this article can greatly minimize, if not eliminate, the difficulty of porting to Linux on Power.

In some cases, where an application has been designed solely on a specific hardware architecture such as x86, it can occasionally require some additional modifications. This article highlights the differences for Linux applications running on x86 systems moving to IBM POWER® processor-based systems and provides recommendations for making your x86 code ready for the port to Linux on Power.

  • The endianness (byte ordering) of your application
  • 32-bit and 64-bit data types (lengths and alignments)
  • Understanding the compilers choices available - GCC (distribution and Advance Toolchain) and IBM XL C/C++ compiler.
  • Using the IBM SDK for Linux and IBM Rational Developer for Power Systems - Building large programs with the GNU C/C++ Compiler.

Advanced tools and technologies are available for application developers on Power Linux, including the IBM Software Development Kit for PowerLinux (SDK) which is a free, Eclipse-based integrated development environment (IDE). The SDK integrates C/C++ source development with Advance Toolchain, post-link optimization, and classic Linux performance analysis tools, including OProfile, Perf, and Valgrind.


Planning for the port

When you port an application to a new platform, proper planning is essential. To adequately prepare for the port you should:

  1. Visually scan through the code paying particular attention to possible mismatched assignment operations and bit manipulations and comparisons.
  2. Understand the IBM Power® platform architectural highlights along with the differences between the x86 and IBM POWER processor architectures - in particular the endianness of your application.
  3. Decide which Linux on Power distribution to use: Red Hat Enterprise Linux or Novell SUSE Linux, CentOS, or Ubuntu.
  4. Consider using the GNU Make build system.
  5. Determine which compiler to use: GCC or IBM XL C/C++ for Linux.
  6. Acquire an IBM Power server for development. A few choices are listed below:
  7. Looking through the programs below, determine if one of them can help accelerate your port and make it successful.

Power Development Cloud
The fully automated IBM Power Development Cloud is a no-charge, platform as a service (PaaS) that can be used to develop and demonstrate software solutions with IBM platforms. You can take advantage of the IBM Linux on Power development stack that has preconfigured and installed development tools for x86 to Linux on Power.

IBM Hardware Mall – Lease and discount
The Hardware Mall is designed to incent independent software vendors (ISVs) to develop software solutions on IBM platforms by providing them significant purchase discounts and low lease rates on Systems hardware and applicable software.

IBM Systems Application Advantage for Linux (Chiphopper)
The IBM Chiphopper™ offering is an application porting or rehosting program that is designed to help IBM Business Partners enable, test, and support their existing Linux applications running on competitive platforms onto IBM Power Systems™ running Linux and middleware platforms — at no cost.

IBM Innovation Centers
The IBM Innovation Centers provide training and one-to-one guidance from building to marketing and selling your solution. The IBM Innovation Center team is ready to help you with your development objectives

You can find and take advantage of all of these services at the Hardware for solution development website.

Understand Power platform differences

The Power hardware platform to which you are porting determines the optimization options that you will use to compile your application. You should consider what processor version, for example, IBM POWER5, IBM POWER6®, IBM POWER7®, or IBM POWER8™ is your base. It is easy to develop code generally for POWER processor-based systems, where the resulting application will run fine across the systems. It is equally easy to target your application starting with some of IBM's later POWER generations. One important choice is whether you choose the GCC compiler or the XL C/C++ compiler. More details are discussed later in this article.

Compilers:

Over the years, GCC technology has significantly improved and is capable of delivering tuned applications for IBM POWER7 and POWER8 processor-based servers. In addition, IBM XL compilers can provide additional performance where performance is the ultimate goal.

  • If you're using GCC on Linux on x86, it is recommended you complete the initial port with GCC. GCC technology on Linux on Power has been significantly improved, and familiarity with GCC can make it easier to start with it as a base.
  • The IBM XL C/C++ compilers are available, if required.
  • If you're using Java™ on Linux on x86, it is recommended that you download and use the appropriate IBM Java toolkit available for download from IBM websites.

For more information about virtualization on Power Systems, review Linux on Power: An overview for developers (developerWorks, updated July 2011) or refer to executive briefing on the possibilities of server consolidation with Power Systems.

Compiler flags:

When porting to POWER processor-based servers, you can use different flags on -qarch and -qtune with the XL C/C++ compiler, or -mcpu and -mtune with the GCC compiler to optimize your application. However, for initial ports and to minimize difficulty, it is best to use the most common Power based –qtune=ppc and and –mcpu=power. Specific flags and when to use them are described later in this article.

Decide which Linux on Power distribution to use

The latest operating system offerings (at the time of this publication) from Novell and Red Hat provide the technologies, support, and reliability required for enterprise-level deployments. Often, the selection of the preferred offering is a consideration done within a company or organization.

  • SUSE Linux Enterprise Server (SLES). For more information on SLES, see the Resources section.
  • Red Hat Enterprise Linux (RHEL). For more information on RHEL or CentOS, see the Resources section.

The decision to use SLES or RHEL does not directly impact the porting process. However, be aware that SUSE and Red Hat have different release and update cycles and different policies on binary compatibility, which might affect your application update decision in the long run. In addition, Red Hat and SUSE distributions release concurrent Intel® and Power support on the same schedule.

Migrate to GNU Make

If you are not currently using GNU Make to build your applications, consider migrating to it. It's a good programming practice to use a tool that controls the generation of executables instead of depending on scripts or direct invocation of the compiler to generate the executable. Most C and C++ programmers use Make as that tool. Switching to GNU Make enables you to be consistent in your build operations across multiple platforms with the same build control, the makefile. GNU Make is distributed with both SLES11 and RHEL6. For more information about GNU Make, see the Resources section.

Assuming your Linux on x86 application is being built with GCC, we recommend the straight-forward approach of recompiling on Power with GCC first. Later steps will add easy tunings and optimizations to consider. If required, a subsequent next step would be to consider the IBM XL compilers for better performance tuned to the specific IBM Power Architecture® level.

Understand the differences between the x86 and Power Architecture derivatives

There are several architecture-specific differences you should be aware of before porting your x86 Linux applications to Power Systems. The following architectural differences that are described in detail in the next couple of sections are particularly noteworthy:

  • Endianness or byte ordering
  • Data type length in 32- and 64-bit application environments
  • Data alignment differences in the architectures

Later we will introduce several additional considerations when comparing applications between x86 and Power. Examples of optimization considerations include optimizing for throughput with multiple threads, assessing the usage of simple locking schemes, and so on.


Endianness (byte ordering)

Although, POWER processors can support both big or little-endian (byte ordering scheme), implementations to date have used a big-endian architecture, while x86 processors use a little-endian architecture. This section covers endianness (also known as byte ordering) issues and describes the techniques for handling them. Byte-ordering issues are often encountered by developers during the process of migrating low-level applications, device drivers, or data files from the x86 architecture to the Power Architecture. High-level applications often do not run into endianness issues when porting, but the issue needs to be assessed when porting from x86 to Power.

Endian byte ordering affects integer and floating point data but does not affect character strings as they maintain the string order as viewed and intended by the programmer. So, raw ASCII text file would be fine but machine-independent data files can suffer the effects of hardware endianness issues. For instance, JPEG files are stored in the big endian format while GIF and BMP files are stored in the little endian format.

Big endian and little endian

Endianness refers to how a data element and its individual bytes are stored and addressed in memory. In a multi-digit number, the digit with a higher order of magnitude is considered more significant. For example, in the four-digit number 8472, the 4 is more significant than the 7. Similarly, in multi-byte numerical data, the larger the value the byte is holding, the more significant it is. For example, the hexadecimal value 0x89ABCDEF, can be divided into four bytes: 0x89, 0xAB, 0xCD, and 0xEF with arithmetic values of 0x89000000, 0xAB0000, 0xCD00, and 0xEF. Obviously, byte 0x89 is the largest value; therefore, it is the most significant byte, while byte 0xEF is the smallest part and thus the least significant byte.

When a number is placed in memory, starting from the lowest address, there are only two sensible options:

  • Place the least significant byte (little endian) first.
  • Place the most significant byte (big endian) first.

The following diagram shows how big-endian and little-endian processors place a 32-bit hexadecimal value, such as 0x89ABCDEF, in its memory:

Figure 1. Big-endian and little-endian processors storing hexadecimal values

0x89 is the most significant byte, and 0xEF is the least significant byte. On big-endian systems, the most significant byte is placed at the lowest memory address. On little-endian systems, the least significant byte is placed at the lowest memory address.

Byte ordering is not an issue if a program writes a word to memory and then reads the same location as a word because it sees the same value when a variable is referenced consistently. If a program tries to read the same value one byte at a time (when a word is written), it might get different results depending on whether the processor is big endian or little endian.

IBM POWER processor families are examples of systems that use the big-endian data layout, while the x86 processor families are examples of systems that use the little-endian data layout. Identifying endian-dependent code segments and transforming them into the big-endian equivalent is important during the migration of x86 applications to the Power platform.

Dealing with endianess

This section describes how to identify the endian-dependent areas in your code and methods to convert them to the correct endian format.

Endian-dependent code
Non-uniformity in data referencing is the strength of the C language, making it popular for programming system-level software, including operating systems and device drivers. This strength includes type casting, pointer manipulation, unions, bit fields, structures, and flexible type checking. However, these same features are also sources of endianness portability issues. Consider the following two code listings as examples:

Listing 1. Non-uniform data reference using pointer

#include <stdio.h>
int main(void) {
  int val;
  unsigned char *ptr;

  ptr = (char*) &val;
  val = 0x89ABCDEF; /* four bytes constant */
  printf("%X.%X.%X.%X\n", ptr[0], ptr[1], ptr[2], ptr[3]);
  exit(0);
}

Listing 2. Non-uniform data reference using union

#include <stdio.h>
union {
  int val;
  unsigned char c[sizeof(int)];
} u;

int main(void) {
  u.val = 0x89ABCDEF; /* four bytes constant */
  printf("%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
  exit(0);
}

On an x86 system, the result is:

 EF.CD.AB.89

On a POWER processor-based system, the result is:

 89.AB.CD.EF

The endianness problem surfaces as val and is read, byte by byte, starting with the most significant byte.

How to determine endianness of a system

In Linux, the GNU_C preprocessor normally provides a set of common pre-defined macros automatically whose names starts and end with the double underscore. A set that is most useful for determining the endianness of a system in use are __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, and __ORDER_BIG_ENDIAN__.

/* Test for a big-endian machine */
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
/* Do big endian processing */

Write endian-neutral code

A program module is considered endian neutral if it retains its functionality while being ported across platforms of different endianness. In other words, there is no relation between its functionality and the endianness of the platform it is running on. Here are a few recommendations for writing endian-neutral code:

  • Use macros and directives
    To make the code portable, you can use macros and conditional compile directives as shown in Listing 3 and Listing 4.

Listing 3. Use directives to neutralize endianness effect

#include <stdio.h>

#define BIG_ENDIAN 1
#define LITTLE_ENDIAN 0
#define BYTE_ORDER (( htonl(1)==1) )   //  returns 1 or 0 depending on platform


union {
  int val;
  unsigned char c[sizeof(int)];
}u;

int main(void) {
  u.val = 0x89ABCDEF;
  #if (BYTE_ORDER == BIG_ENDIAN)
  printf("%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
  #else /*! BYTE_ORDER == BIG_ENDIAN*/
  printf("%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
  #endif /*BYTE_ORDER == BIG_ENDIAN*/
  exit(0);
}

Listing 4. Use macros to swap bytes (useful for determining endianness at run time)

//  Useful Endian Neutral Macros

#include <endian.h>        

#if __BYTE_ORDER == __BIG_ENDIAN
// No translation needed for big endian system
#define sw2Bytes(val) val
#define sw4Bytes(val) val
#define sw8Bytes(val) val
#else
//  Little Endian:  Translate
// Swap 2 byte, 16 bit values:

#define sw2Bytes(val) \
 ( (((val) >> 8) & 0x00FF) | (((val) << 8) & 0xFF00) )

// Swap 4 byte, 32 bit values:

#define sw4Bytes(val) \
 ( (((val) >> 24) & 0x000000FF) | (((val) >>  8) & 0x0000FF00) | \
   (((val) <<  8) & 0x00FF0000) | (((val) << 24) & 0xFF000000) )

// Swap 8 byte, 64 bit values:

#define sw8Bytes(val) \
 ( (((val) >> 56) & 0x00000000000000FF) | (((val) >> 40) & 0x000000000000FF00) | \
   (((val) >> 24) & 0x0000000000FF0000) | (((val) >>  8) & 0x00000000FF000000) | \
   (((val) <<  8) & 0x000000FF00000000) | (((val) << 24) & 0x0000FF0000000000) | \
   (((val) << 40) & 0x00FF000000000000) | (((val) << 56) & 0xFF00000000000000) )
#endif
int main(void) {
  int a=0x11121314;
  int b;
  b = sw4Bytes(a);      // b is 0x12 in LE and BE
}
  • Use compile time option
    Another way to implement this is to define the value of BYTE_ORDER on the compiler command line, -DBYTE_ORDER=BIG_ENDIAN. This removes the need to edit every file in a device driver or application when compiling on a new platform with a different byte order. Instead, you may have to edit only the makefiles used to build the driver or application.

Byte order: Linux specific

Turns out that the Linux kernel has a specific set of system macros available that does 16, 32, and 64 bit swaps from little to big endian and from big to little endian. These macros are handy for both porting from Linux x86 to Linux on Power and in making your code endian neutral without ever worrying about coding a bunch of #ifdef __LITTLE_ENDIAN conditional directives throughout your code.

For example:

cpu_to_le16(u16); // converts CPU endianness 4 bytes to little-endian

le16_to_cpu(u16); // converts little-endian 4 bytes to CPU endianness

These two macros convert a value from whatever endian the CPU uses to an unsigned, little-endian, 32-bit quantity, or the other way around. For instance, on a Linux on Power system le16_to_cpu(u16) will convert from little-endian to big-endian and cpu_to_le16(u16) will convert from big-endian to little-endian. Depending on the macros usage and which system it is being run, if there's no work to be done they will just return the original value.

These macros along with many others are located in: /usr/include/linux/byteorder/big_endian.h> and /usr/include/linux/byteorder/little_endian.h.

Take a look at these headers to find the macro you need. You will be able to deduce their usage by following the pattern in the names.

You can easily include the correct include file by using the predefined GNU_C preprocessor directive as follows:

#include <endian.h>

#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
   #include <linux/byteorder/big_endian.h>
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   #include <linux/byteorder/little_endian.h>
#else
   //  …user defined endian header
#endif

Other endianness considerations

When porting from x86 to Power, high-level applications that does not depend on outside data files and follow strict C ANSI programming standards will probably migrate over fine to Linux on Power without having a single endian issue. However, endianness is an important concept to understand and the ability to recognize these vulnerabilities in your code can make migrating to Linux on Power easy.

In addition, in the next section we highlight additional code analysis tools available within the IBM Linux SDK that can help in identifying endianness issues.

Choose which migration tools to use

Similar to mechanics, porting engineers must have an assortment of tools in their "technical" toolbox, and knowing which one to use minimizes the difficulty of porting. The IBM Linux on Power SDK and IBM Rational Developer for Power Systems are all-in-one integrated development environments (IDEs) allowing easy porting and development on Power servers and are excellent technical tools for developers porting to Linux on Power.

IBM SDK for PowerLinux with Migration Advisor

The IBM Software Development Kit for PowerLinux (SDK) is a free, Eclipse-based IDE. The SDK integrates C/C++ source development with the Advance Toolchain, post-link optimization tool (such as FDPR), and classic Linux performance analysis tools, including OProfile, Perf and Valgrind. In addition, within IBM SDK for PowerLinux is a tool called the Migration Advisor. One of the features of the Migration Advisor tool is the code checker and code fix feature where the tool analyzes your code for Linux/x86 vulnerabilities and gives you the option to fix the vulnerabilities in code quickly or gives you suggestion on how to fix it manually. For instance, the casting below is a typical potential endianness problem when porting form x86 Linux to Linux on Power. The message, "Cast with endianness issue checker" flags such vulnerabilities.

void foo() {
    short int val = 0xFF00;
    char *char_pointer = (char *) &val;     
    //This prints 0 in x86 (or x86_64) machines and ff in ppc (or ppc64) machines.
    printf("%x\n", *char_pointer);
}

Migration Advisor uses the Eclipse CDT Code Analyzer (Codan) source code analyzer, to locate potential migration problems in a C/C++ source code, such as code blocks that might produce different results when running on x86 or POWER servers. To look for problems in the source code, Codan analyzes the C/C++ source code abstract syntax tree (AST) to find excerpts that might not be compatible to the Power Architecture. When Migration Advisor finds a potential problem it saves the location of the problem in the code and adds a warning at that specific position in the source code. Currently, the Migration Advisor is compatible with C/C++ code only.

Here's a screen capture of using the Migration Advisor within the IBM SDK for PowerLinux.

Figure 2. Running the Migration Advisor 'Union with Endianness Issues' checker

Migration Advisor: Quick fix feature

The Migration Advisor also provides quick fixes for some migration issues, replacing architecture-dependent code blocks with instructions compatible with the POWER processor. There are two ways to trigger the quick fixes: right-click the warning in the source code and select the quick fix option; or open Migration Advisor Eclipse View, right-click a specific problem and then select the quick fix option.

The quick fixes provided by Migration Advisor apply to:

  • Usage of x86-specific, built-in compiler
  • Usage of inline assembly
  • Performance degradation

You can select the checkers that would be activated or deactivated. Migration Advisor will recheck the code and update the results each time you modify the source, giving you the opportunity to identify and solve problems quickly before going through a complete rebuild of your project. Figure 3 depicts the Migration Advisor window, with all checkers activated.

Figure 3. Activating all checkers

After activating the Migration Advisor checkers, right-click the project and click Run Migration Advisor. The Migration Advisor then analyzes the code using the enabled checkers and shows the result in the Eclipse view identifying the potential problems found (see Figure 4).

Figure 4. Migration Advisor view

Figure 5, shows an example of a real migration issue caused by a built-in which is not supported on the Power platform. In order to fix the problem, you just double-click the problem in the Migration Advisor Eclipse view. It will then open the source code view, showing the exact location of the problem highlighted. If you move the mouse over the highlighted code and press Ctrl+1, another pop-up window as shown in Figure 5 opens.

Figure 5. Using Migration Advisor quick fix

Select the first option, Builtin quick fix. The resulting code, after applying the quick fix, is shown in Figure 6.

Figure 6. Code after applying Migration Advisor quick fix

Other useful Migration Advisor checks are:

  • x86 specific assembly checker
  • Struct with bitfields checker
  • Long double usage checker
  • Performance degradation checker
  • Usage of x86-specific compiler builtin

For a complete list of handy checks, refer to the IBM Knowledge Center.

Linux Cross-platform Tool (LCT)
The SDK for PowerLinux also provides another handy tool that runs on any existing Intel x86 system. It is a command-line tool that identifies potential portability issues with an application. That way, you will be able to quickly get an idea of potential problems with source before porting to Linux on Power. After you install the IBM SDK for PowerLinux, you can find documentation about LCT in the README section for the tool.

IBM Rational Developer for Power Systems
IBM Rational Developer for Power Systems also provides a rich desktop integrated development, porting, and optimization environment for multi-platform development (IBM i, AIX, or Linux on Power). Its features also include various porting tools such as the Code Analysis tool, Performance Advisor tool, and the Migration Assistant of which, similar to the IBM SDK for PowerLinux, can detect 32- to 64-bit migration issues, endianness issues, and other operating system issues that could impede portability.

JVM
The Java virtual machine (JVM) operates in big-endian mode on all platforms and thus is often immune from processor architecture effects.


Data types and alignments

Important subtle signed/unsigned char application binary interface (ABI) differences.

Under x86/x86_64, the default for "char" is as a "signed char", while on Power Systems the default is as an "unsigned char". On Power Systems, you can override that default by using the GCC directive, -fsigned-char. The equivalent IBM XL directive is -qchars=signed.

Both GCC and XL C/C++ compilers on the Linux operating system offer two different programming models: ILP32 and LP64. ILP32, which stands for Integer Long Pointer 32, is the native 32-bit programming environment on Linux. The ILP32 data model provides a 32-bit address space, with a theoretical memory limit of 4 GB. LP64, which stands for Long Pointer 64, is the 64-bit programming environment on Linux.

Table 1 shows the width in bits of base data types in ILP32 and LP64 models on POWER and x86.

Table 1. Base data types of ILP32 and LP64 on POWER and x86 (in bits)

Base data typePOWERx86
ILP32ILP64ILP32ILP64
char
Default: signed on x86 - unsigned on POWER
8 8 8 8
Short 16 16 16 16
Int 32 32 32 32
Float 32 32 32 32
Long 32 64 32 64
Long long 64 64 64 64
Double 64 64 64 64
Long double 64/128* 64/128* 96 128
Pointer 32 64 32 64

*The default size for long double in Linux on Power is now 128 bits. They can be reduced to 64 bits if you use the compiler option -qnoldb128 with the XL C/C++ compiler.

All definitions for numeric values can be found in /usr/include/limits.h on both Power and x86 platforms.

Many legacy Linux x86 applications run in 32 bits. With the latest x86 architectures, which support and encourage 64-bit applications, more x86 applications are being updated or written natively in 64 bit. In the exercise of porting x86 applications to Power Systems, you can target your Linux on Power environment to match your source environment. In other words, we recommend completing your initial port first, then later consider moving to a 64-bit programming model. If you want to port a 32-bit x86 application to the 64-bit Power Systems programming model, treat the migration as two steps:

  1. Port to the Linux on Power 32-bit environment (including testing and verifying it).
  2. Then migrate to a 64-bit environment.

Engineers should consider porting an application to 64 bit if the application can:

  • Benefit from more than 4 GB of virtual address space
  • Benefit from more physical memory (greater than 4 GB), and if its users are likely to deploy it on a system with more than 4 GB of physical memory
  • Benefit from 64-bit size long integers
  • Benefit from full 64-bit registers to do efficient 64-bit arithmetic
  • Use files larger than 2 GB

Some examples of applications that can benefit from being migrated to 64 bit include:

  • Database applications, especially those that perform data mining
  • Web caches and web search engines
  • Components of computer-aided design (CAD)/computer-aided engineering (CAE) simulation and modeling tools
  • Scientific and technical computing applications, such as computational fluid dynamics, genetic simulation, and so on

An application can remain 32-bit and still run on the 64-bit Linux on Power kernel without requiring any code changes. IBM Power processor-based servers support both 32-bit and 64-bit applications running simultaneously on the 64-bit architecture

When porting applications between different platforms (from x86 to Power) or programming models (from ILP32 to LP64), you need to take into account the differences between data width and alignment settings available in the different environments to avoid possible performance degradation and data corruption.

When porting from x86 ILP32 to POWER ILP32 or x86 LP64 to POWER LP64, notice that, in Table 1, the width of all the basic data types remains the same except the long double 96 bits to 64/128 bits for IPL32 bit and 128 bits to 64/128 bits for LP64. This means that you should probably examine the portion of the code related to long double data types. If you plan to use the XL C/C++ compiler, use the -qlongdouble compilation flag to get maximum compatibility on the long double data type.

Data alignment

When porting applications between platforms, or between 32-bit and 64-bit models you will need to consider the differences between alignment settings available in different environments to avoid possible performance degradation and data corruption. The best practice is to enforce natural alignment of the data items. Natural alignment means storing data items at an address that is a multiple of their sizes (for instance, 8-byte items go in an address multiple of eight). For the XL C/C++ compiler, each data type within aggregates (C/C++ structures/unions and C++ classes) will be aligned along byte boundaries according to either the linuxppc or bit-packed rules, where linuxppc is the default and is the natural alignment. linuxppc is also compatible with the default GCC alignment rules. Table 2 shows alignment values on POWER and x86 along with their data type widths in bytes.

Table 2. Alignment values on POWER and x86 (in bytes)

 Data type POWER x86
ILP32ILP64ILP32ILP64
Width Align Width Align Width Align Width Align
Char 1 1 1 1 1 1 1 1
Short 2 2 2 2 2 2 2 2
Int 4 4 4 4 4 4 4 4
Float 4 4 4 4 4 4 4 4
Long 4 4 8 8 4 4 8 8
Long long 8 8 8 8 8 8 8 8
Double 8 8 8 8 8 8 8 8
Long double 8/16 8 8/16 8 12 4 16 16
Pointer 4 4 8 8 4 4 8 8

The keyword __alignof__ in both GCC and XL C/C++ allows you to inquire about how an object is aligned. Its syntax is similar to that of sizeof. For example, if the target system requires a double value to be aligned on an 8-byte boundary, then __alignof__ (double) is eight.

As shown in Table 2, a long double variable is aligned to four bytes on x86, but aligned to eight bytes on Power Systems. Structures will, therefore, have a different layout on different platforms. It's important not to hard code any sizes and offsets. Instead, use the C operator sizeof to inquire about the sizes of both fundamental and complex types. The macro offsetof is available to get the offsets of structure members from the beginning of the structure.

Determine which compiler to use: GCC or IBM XL C/C++

There are two C/C++ compilers available for Linux on Power: GCC and IBM XL C/C++ compiler. GCC offers robust portability of code intended for compilation in Linux, while the IBM XL compilers offer a substantial performance increase over GCC when higher levels of optimization are used. Both compilers offer 32- and 64-bit compilation modes, and the Linux on Power environment allows both 32- and 64-bit code to be run simultaneously without performance loss.

Porting with the GCC compiler set

For projects developed across multiple platforms where a GCC compiler is the original compiler, the GCC compiler is often used to deploy applications for Linux on Power. This is especially true for applications where performance is not critical, such as small utilities. GCC also allows some code prototypes that are understood by only GCC, such as GCC-specific macros. Note, however, many of these GCC-specific features are incorporated in the XL C/C++ compilers.

In general, porting code with GCC compilers should be straightforward. In most cases it's a simple recompile and is as easy as typing the make command. Architectures may vary, and occasionally library version discrepancies may arise. But for the most part it doesn't matter which architecture it runs on. Architecture-specific flags, such as -m486 and -mpowerpc64, are discouraged for compilation on Power Linux because GCC does not have extensive processor maps for optimization routines on these architectures. Also, without architecture-specific flags, binary compatibility will be greater across different models of POWER hardware.

The GCC compilers that are included with SLES 11 and RHEL 6 support both 32-bit and 64-bit application. With the SLES 11 and RHEL 6 versions, the default compilation mode is 64-bit.

On all architectures, libraries must be compiled with -fPIC; x86 has this flag applied by default with GCC. On POWER, this flag specifies that the generated code will be used in a shared object. Review the GCC compiler manuals for more information (see Resources).

Porting with the IBM XL C/C++ compiler

XL C/C++ uses the GNU C and C++ headers, and the resulting application is linked with the C and C++ runtime libraries provided with GCC. This means that the XL C/C++ compilers produce GNU Executable and Linking Format (ELF) objects, which are fully compatible with the objects GCC compilers produce. XL C/C++ includes the symmetrical multiprocessing (SMP) runtime library to support the automatic parallelization and OpenMP features of the XL C/C++ compilers.

Moving from GCC to XL C/C++ for Linux on Power is straightforward. XL C/C++ assists with the task by providing an option, -qinfo=por, to help you filter the emitted diagnostic messages to show only those that pertain to portability issues. In addition, a subset of the GNU extensions to gcc and gcc-c++ are supported by XL C/C++.

See "XL C/C++ for Linux on pSeries Compiler Reference" for a complete list of features that are supported, as well as those that are accepted but have semantics that are ignored.

To use supported features with your C code, specify either -qlanglvl=extended or -qlanglvl=extc89. In C++, all supported GNU gcc/gcc-c++ features are accepted by default. Furthermore, gxlc and gxlc++ help to minimize the changes to makefiles for existing applications built with the GNU compilers

Optimization options in XL C/C++

XL C/C++ provides a portfolio of optimization options tailored to IBM hardware. For Linux on Power, many applications compiled with XL C/C++ utilizing the correct combination of optimization flags have shown significant performance improvements over those compiled with GCC. Note that not all optimizations are beneficial for all applications. There's usually a trade-off between the degree of optimization done by the compiler and an increase in compile time accompanied by reduced debugging capability.

Optimization levels
Optimization levels are specified by compiler options. The following table summarizes the compiler behavior at each optimization level.

Table 3. Compiler behavior at each optimization level

OptionBehavior
-qnoopt Provides fast compilation and full debugging support.
-O2 (same as -O) Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. This setting implies -qstrict and -qstrict_induction, unless explicitly negated by
-qnostrict_induction or -qnostrict.
-O3 Performs additional optimizations that are memory intensive, compile-time intensive, or both. They are recommended when the runtime improvement outweighs the concern for minimizing compilation resources.
-O4 and -O5 Performs interprocedural optimization, loop optimization, and automatic machine tuning.

Older Power options that should be avoided

There is a list of options from older Power Architecture that should be avoided and are harmful when porting from x86 to Linux on Power.

  • PPC64 builds that use -mminimal-toc or -mfull-toc should be replaced with -mcmodel=medium (which is the default)
  • -ffloat-store, do not use on Power

Target machine options are options that instruct the compiler to generate code for optimal execution on a given microprocessor or architecture family. By selecting appropriate target machine options, you can optimize to suit the broadest possible selection of target processors, a range of processors within a given family of processor architectures, or a specific processor. The following options control optimizations affecting individual aspects of the target machine:

Table 4. Optimization options affecting individual aspects of a target machine

Option Behavior
-qarch Selects a family of processor architectures for which instruction code should be generated. The default is -qarch=auto. The following suboptions are also available: ppc64grsq, pwr3, pwr4, pwr5, ppc970, ppc64, ppcgr, rs64b, and rs64c.
-qtune Biases optimization toward execution on a given microprocessor without implying anything about the instruction set architecture to use as a target. The default on Linux is -qtune=auto. Available suboptions include:, pwr3, pwr4, pwr5, pwr6, pwr7, pwr8 , ppc970, rs64b, and rs64c. However, many times this setting is automatically set based on the setting of –qarch.
-qcache Defines a specific cache or memory geometry. If -qcache is used, use -qhot or -qsmp along with it.
-qhot High-order transformations are optimizations that specifically improve the performance of loops through techniques such as interchange, fusion, and unrolling. The option -qhot=vector is the default when -qhot is specified. Try using -qhot along with -O2 and -O3. It is designed to have a neutral effect when no opportunities for transformation exist.
-qsmp Generates threaded code needed for shared-memory parallel processing. The option -qsmp=auto is the default when -qsmp is specified. Use -qsmp=omp:noauto if you are compiling an OpenMP program and do not want automatic parallelization. Always use the _r compiler invocations when using -qsmp.

To get the most out of target machine options, you should:

  • Use -qarch, specify the smallest family of machines possible on which you expect your code to run well.
  • Use -qtune, specify the machine on which the performance should be best. For example, if your application will be targeted at Power servers starting with POWER8 processor-based systems, use -O3 -qarch=pwr8 -qtune=pwr8.

IBM Power platforms support machine instructions that are not available on other platforms. XL C/C++ provides a set of built-in functions that directly map to certain POWER processor instructions. Using these functions eliminates function call-return costs, parameter passing, stack adjustment, and all additional costs related to function invocations. For the complete list of the supported built-in functions, refer to the XL C/C++ C++ for Linux on pSeries Compiler Reference installation document.

However, software originally intended to be compiled with GCC compilers might need more attention when recompiled with IBM XL C/C++ compilers. You need to edit makefiles to reflect the correct path to the XL C/C++ compilers, which is by default /opt/ibmcmp/. You also need to set the correct optimization flags for the specific architecture (such as -03 -qarch=pwr8 -qtune=pwr8 for IBM POWER8 processors).

In addition to the optimization modes for various Power Architecture derivatives, the XL C/C++ compilers can be instructed to compile software in common mode. This guarantees compatibility across all Power Architecture derivatives at the expense of performance otherwise gained through architecture-specific optimization. When compiling 64-bit code, the compiler must be given the flag for a 64-bit build (that is -q64), because it defaults to the 32-bit mode.

A few tips for compiling GCC-oriented code with the XL C/C++ compiler set are listed below:

  • GCC allows C++ style commenting to be used in C files by default, but this is not the case with XLC, the XL C compiler. Because it's not economical to change all of the comments in a group of source files to comply with C style commenting, XLC provides a -q argument to allow these comments: -q cpluscmt. When C code is compiled with this flag, both C and C++ style commenting is interpreted.
  • Environment variables are often the easiest way to configure build scripts. As opposed to hand editing configure scripts and makefiles, you should set relevant environment variables, such as $CC and $CFLAGS that you allow the configure script to generate the makefile without the need for hand editing.
  • Configure scripts also need to be aware of the proper platform type. This will be either linux-powerpc-unknown-gnu or linux-powerpc64-unknown-gnu. You should set this for compilation with either GCC or XL C/C++ by appending the
    -target= flag to the configure script.
  • Although extensive documentation for the IBM XL C/C++ compilers exists, an argument list is provided at the console by running the compiler with no arguments, for example, $COMPILER_PATH/bin/cc.

Comparison of compiler options

The following table compares the commonly used compiler options from GCC and XL C/C++.

Table 5. Commonly used compiler options from GCC and XL C/C++

GCC XL C/C++ Description
-v -v, -V, -# Turn on verbose mode.
-p/-profile -p Set up the object files produced by the compiler for profiling.
-m32, -m64 -q32, -q64, or set OBJECT_MODE environment variable Create 32- or 64-bit objects.
-fsyntax-only -qsyntaxonly Perform syntax checking without generating object files.
-fpic -qpic=small Generate position-independent code for use in shared libraries. In XL C/C++, the size of the global offset table is no larger than 64 KB. If -qpic is specified without any suboptions, -qpic=small is assumed. The -qpic option is enabled if the -qmkshrobj compiler option is specified.
-fPIC -qpic=large Allow the global offset table to be larger than 64 KB.
-pthread -qthreaded or _r invocation mode Create programs running in a multithreaded environment.
-fno-rtti -qnortti Disable generation of runtime type identification (RTTI) -qrtti for exception handling and for use by typeid and dynamic_cast operators. On XL C/C++, the default is -qnortti.
-static -qstaticlink Prevent an object generated with this option from linking with shared libraries.
-static-libgcc -qstaticlink=libgcc Instruct a compiler to link with the static version of libgcc.
-shared -qmkshrobj Instruct a compiler to produce a shared object.
-shared-libgcc -qnostaticlink=libgcc Instruct a compiler to link with the shared version of libgcc.
-Wl,-rpath -Wl,-rpath or -R Pass a colon-separated list of directories used to specify directories searched by a runtime linker.
-fno-implicit-templates, -frepo -qtempinc, -qtemplateregistry, -qtemplaterecompile Instantiate template.
-w -w Suppress warning messages.
-warn64 Enable checking for long-to-integer truncation.
-qinfo=< > Produce informational messages.
-fpack-struct -qalign=bit_packed Use bit_packed alignment rules.
-qalign=linuxppc Use default GCC alignment rules to maintain compatibility with GCC objects. This is the default.
-O,-O2,-O3 -O,-O2,-O3,-O4,-O5 Use to set optimization levels.
Ofast, Og Ofast – provides optimization for speed without complete compliance to standards
Og – provides optimization, but also allows for some effective debugging if needed.
-mcpu, -mtune -qarch, -qtune, -qcache Use to set optimization options for a particular processor.


Building large programs with GCC

Customers occasionally need to build a very huge executable file that is generated at build time. A common error message is shown in Listing 5.

Listing 5. Error message while building huge executables

modelfile.cxx.o:(.text+0x212012):
       relocation truncated to fit: 
R_PPC64_TOC16_DS against `.toc'+10000

This a common problem with large programs and older GCC compilers. The older GCC compilers optimistically assumed that the table of contents (TOC) will not overflow the 64 KB reach of a single load instruction. In this case, the program is large enough that the sum of all the TOC entries required exceeds 16 bits (TOC16_DS).

There are two solutions.

  1. With the Advance Toolchain 3.0 and older compilers, you can recompile with -mminimal-toc.
  2. With Advance Toolchain 4.0 and later compilers, you can recompile with -mcmodel=medium. The -mcmodel=medium should give better performance but requires the upgrade.

For more details, see GCC PowerPC options.


Porting steps

After you complete each of the planning steps, you should be ready to perform the port. This section outlines the recommended steps to successfully port your application to Linux on Power.

  1. Migrate your build system to GNU Make (if necessary)
    This is the process to build one or several makefiles. You can also take advantage of GNU's program build automation utilities, such as Autoconf, Automake, and Buildtool to maximize your program's portability across different UNIX® platforms. GNU's program build automation utilities are at directory.fsf.org/devel/build/.
  2. Modify architecture-dependent code (if necessary)
    Consider byte ordering (endianness), data length in 32-bit and 64-bit models, and data alignment on the different platforms, which are mentioned in the Understand the differences between the x86 and Power Architecture derivatives section.
  3. Build
    After the makefiles are built and all the programs are modified, the build process is as simple as issuing a command, such as make. If you encounter errors during the building process, they will usually be compiler and linker errors and program syntactic errors. Modifying the compiler options through the makefiles or modifying the offending syntax in the code usually fixes the errors. The compiler reference menu and programming guide is your best reference during this phase. For IBM XL C/C++, refer to XL C/C++ for Linux Compiler Reference (compiler.pdf) and XL C/C++ for Linux Programming Guide (proguide.pdf). For GCC, both the compile reference and programming guide are at gcc.gnu.org/onlinedocs.
  4. Test and troubleshoot
    After the program is successfully built, test it for runtime errors. Runtime errors are usually related to your program logic in this phase. It's always a good idea to write several test programs to verify that the output of your application is the one you expected.
  5. Tune performance
    Now that the ported code is running on the Power platform, monitor it to ensure that it performs as expected. If it doesn't, you'll need to complete performance tuning. You can use the following suite of tools to identify performance problems in your application and show how your application interacts with the Linux kernel.
    • OProfile
      The OProfile tool profiles code that is based on hardware-related events, such as cache misses or processor cycles. For example, OProfile can help you determine the source routines that cause the most cache misses. OProfile uses hardware performance counters provided in many processors, including IBM POWER6 and POWER7. For more information about OProfile for Power Linux, visit the OProfile website (see Resources).
    • Post-link optimization - also known as FDPRpro
      The post-link optimization tool optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload. It then re-analyzes the program (together with the collected profile), applies global optimizations (including program restructuring), and creates a new version of the program optimized for that workload. The new program generated by the optimizer typically runs faster and uses less real memory than the original program. For more information, visit the post-link optimization website (see Resources).
  6. Package
    If your ported application will be a commercial product or you want to distribute the application to third parties to install, you need to package your ported application, including libraries, documentation, and sometimes source code. Linux provides several ways to package your application, such as a .tar file, self-installing shell script, and RPM. RPM is one of the most popular packaging tools for Linux.

Figure 7. Flow of the porting activities described in steps one through six above


Summary

Linux on Power offers an enterprise-class Linux environment, complete with both 32-bit and 64-bit application environment and toolchains. Linux on Power offers twin compiler sets that provide ease of migration with open source code and high-performance exploitation of the award-winning Power Architecture. Porting your Linux on x86 application to Linux on Power allows you to take advantage of the application performance available on the Power Architecture, now augmented by development tools never before offered on the Linux operating system. In summary, Linux on Power is a leading platform for the deployment of high-performance Linux applications.


Acknowledgments

Thanks to Calvin Sze for the original article and thanks to Linda Kinnunen for the original article template and helpful reviews and to the IBM Linux on Power team for their technical assistance and reviews of this article.

Thanks to Otavio Busatto Pontes for the updates in this article on the new IBM SDK for Linux on Power. Check out the Linux on Power Community for additional pointers to the SDK and some blog postings on the SDK from Wainer dos Santos Moschetta.

Thanks to Roberto Guimaraes Dutra de Oliveira/Brazil/IBM@IBMBR and Leonardo Rangel Augusto/Brazil/IBM@IBMBR for the updates on the Migration Advisor in the IBM SDK for PowerLinux.


Resources

For related information

Recommended tools and documentation

  • For more information of the IBM SDK for Linux on Power based on Eclipse technology, see IBM's SDK for Linux on Power
  • To take advantage of the latest GCC and POWER processor-tuned libraries, refer to IBM SDK for PowerLinux
  • For more information about OProfile for Linux on Power, visit the OProfile website.
  • To learn more about the post-link optimization tool (also known as FDPRpro), visit the post-link optimization community.
  • For more GCC compiler information, review the GCC compiler manuals.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=974250
ArticleTitle=Guide to port Linux on x86 applications to Linux on Power
publish-date=06112014