GNU C/C++ toolchain for Linux on POWER

Learn about the GNU toolchain for Linux™ on POWER™. This paper highlights the general options available for using the GNU compiler, linker, and loader with Linux on POWER and discusses the GNU binutils, focusing on Linux on POWER-specific considerations and the new features provided in SUSE Linux Enterprise Server, Version 9, and Red Hat Enterprise Linux, Version 4.

Share:

Matt Davis, Linux Power Technical Consultant, IBM, Software Group

Matt Davis is a Linux on POWER consultant for the IBM eServer Solutions Enablement organization at IBM living in Austin, TX. He has authored more than a dozen research reports and papers on Linux, Linux on POWER, and UNIX competitive analysis since coming to IBM in May 2000.



Gary Hook (ghook@us.ibm.com), Senior Technical Consultant, IBM, Software Group

Gary R. Hook is a senior technical consultant in the Solutions Development Group at IBM, providing application development, porting, and technical assistance to independent software vendors. Mr. Hook's professional experience focuses on Unix-based application development. Upon joining IBM in 1990, he worked with the AIX Technical Support center in Southlake, Texas, providing consulting and technical support services to customers, with an emphasis upon AIX application architecture. Now residing in Austin, Mr. Hook was a member of the AIX Kernel Development team from 1995 through 2000, specializing in the AIX linker, loader, and general application development tools.



04 May 2005

Introduction

One noted advantage of GNU software is its portability from an exacting developer community. The GNU development toolchain refers to the GNU Compiler Collection, the GNU libc, and the standard GNU binutils used to build, test, and analyze software. These tools conform to the PowerOpen ABI and the 64-bit PowerPC® ELF ABI Supplement guidelines to ensure binary compatibility with other conforming tools. In addition, they are the default development toolchain for Linux on POWER.

Though the GNU tools emphasize compatibility, there are inherent differences in the POWER architecture as compared to other common development platforms. The POWER instruction set differs from other architectures, and the GNU toolchain accounts for these differences. Developers should understand the specifics of using the GNU toolchain for Linux on POWER. For example, there are differences between the Linux on POWER ABIs and other common ABIs, and developers must know how these differences affect their code when developing and porting software.

It is worth mentioning that the IBM XL C/C++ and Fortran compilers also use the GNU toolchain to manufacture binaries. Where relevant, the GNU toolchain is discussed with regard to XL C/C++. This document addresses the software development and portability requirements of the GNU toolchain for Linux on POWER by addressing the basic operation, POWER-specific operation, potential pitfalls, and newly developed features of the GNU toolchain, piece by piece. The GCC C/C++ compilers, the GNU linker and assembler, and some other GNU binutils are also explored. While a number of platform agnostic topics are covered, GNU manuals are often cited for complete coverage of these topics. This paper does not discuss the specifics of other GCC compilers, IBM XL C/C++, Fortran, or Java development for Linux on POWER.

We begin with the GCC compiler and compile driver, progress to the GNU linker, GNU assembler, and other binutils. Though the GNU C Library is not explicitly discussed, as POWER-specific variations are subtle and should not concern the user, the C Library is generally considered to be part of the GNU Toolchain. Where applicable, we identify the subtle differences found with Linux on POWER to assist developers familiar with either Linux running on other platforms or AIX running on the POWER architecture. (For example, the differences between x86 and POWER in the GNU assembler for the former and the differences between the ELF and XCOFF ABIs for the latter.)


The GNU Compiler Collection

Historically, GCC referred to the GNU C Compiler, but now refers to the GNU Compiler Collection. GCC is a collection of integrated compilers for the C, C++, Objective-C, Java™, Fortran, and Ada programming languages. However, this document limits its discussion to the C and C++ compilers. Specifically addressed are GCC's basic operation and options, the operation of GCC features specific to POWER architecture, and the most recent features supported by GCC, as packaged with the two leading distributions of Linux for POWER architecture: Red Hat Enterprise Server AS and SUSE LINUX Enterprise Server.

Basic Operation of GCC

Basic GCC operation drives preprocessing, compiling, assembly, and linking. Most options passed to GCC are actually directed to one of these other components of the toolchain. Some options, such as platform selection, debugging, and optimization flags provide arguments to both the compiler and other components.

Input options
The compiler must know what type of input is to be processed. A C source file is processed differently than a C++ file, for example. An implicit option fundamental to the compiler is the file extension of the source file. This determines which of the GCC compilers is invoked. For example, file.c invokes the C compiler, while foo.C or foo.cxx invokes the C++ compiler. A full list of accepted source filename extensions is available in the GCC manual.

The following diagram illustrates the steps taken by the GNU Toolchain to produce an executable program or shared object. The GCC driver accepts options to control the toolchain indirectly.

Figure 1. Steps to produce an executable program or shared object
Steps to produce an executable program or shared object

Output options
The compiler must also know what type of output is expected. The GCC driver can produce instructions for the rest of the toolchain to manufacture a complete executable, or it can stop at a number of intermediates. You can control the output of the compiler with either source filename extensions, or with command line options.

Table 1 summarizes output file selection:

Table 1. Output file selection
FlagActionOutput format
-cassemble, but do not link.o file
-Scompile, but do not assemble .s file
-Epreprocess, but do not compilepreprocessed .c file

For example, the following command will preprocess and compile the source file hello.c into the file hello.s:

$ gcc -S hello.c

The resulting file, hello.s, is ready to be assembled into object code.

Another set of output options affect the compilers verbosity, not the format of the output file. The -v option will show verbose details of the compile process, and the -### option will show these details without executing the commands. The latter option is especially valuable for creating build scripts.

Dialect and Standards options
As previously stated, GCC can compile binaries for a number of languages. These languages have specified variations that outline accepted conventions of the language, known as dialects. GCC accepts options to specify the dialect used by the compiler. These include standard options, as well as more specific flags to modify standard restrictions. GCC supports the ANSI C Standard published in 1990, the extensions added in 1995, as well as partial support for the C99 corrected standard.

The -ansi and -std flags can be used to force restrictions on the compiler. The -ansi flag disables use of GNU extensions (see Table 2 below), such as inline and asm. Note that alternate GNU keywords, such as __inline__ and __asm__, can continue to be used, but their use would forgo compliance to language standards. The -ansi option also disables the use of C++ style commenting in C programs.

The -std= options provide the ability to specify a level of the standard or the GNU extension of that standard. These options are summarized in the table below. A full list of the available specifications is available in section 3.4 of the GCC manual. (See Resources.)

Table 2. Standard and standard extension options
-std= flagStandard or standard extension
c89 or iso9899:1990 ISO C90 (same as -ansi)
iso9899:199409 ISO C90, 1994 amendment
c99, c9x, iso9899:1999, or iso9899:199xISO C99 (not fully supported)
gnu89ISO C90 with GNU extensions (default)
gnu99 or gnu9x ISO C99 with GNU extensions
c++98 Amended 1998 ISO C++ standard
gnu++98Amended 1998 ISO C++ standard, with GNU extensions (default)

Features of newer standards can still be used if they do not conflict with previous C standards. This is true even if a -std= flag is not supplied. For example, you may use __restrict__ even when -std=c99 is not specified, as it does not explicitly contradict the older standard specified.

More specific flags, such as -fno-asm, are available to force specific details of compliance. These types of flags provide enforcement of keywords, the universal signedness of data types, and show where data elements are stored in the binary, among other things. See sections 3.4 and 3.5 respectively of the GCC manual for full coverage of C and C++ dialect options (see Resources).

In addition to ANSI standards, GCC supports a number of GNU extensions to the C and C++ languages that are not supported by the standards. The majority of these extensions are not discussed here; however it is highly recommended that developers be aware of them if they intend their code to be portable to and from other GNU platforms. One important GNU extension is covered in Listing 1 below, as it is both commonly used and a good example of an extension that can be made portable.

GNU extensions can be identified by their leading and trailing double underscores. For example, __asm__ is a GNU extension for asm, which allows specification of the operands of the instruction using C expressions.

Consider this example, using GNU extensions from the Linux kernel source:

Listing 1. GNU extensions
static __inline__ void atomic_sub(int a, atomic_t *v)
{
        int t;
        __asm__ __volatile__(
"1:     lwarx   %0,0,%3         # atomic_sub\n\
        subf    %0,%2,%0\n"
        PPC405_ERR77(0,%3)
"       stwcx.  %0,0,%3 \n\
        bne-    1b"
        : "=&r" (t), "=m" (v->counter)
        : "r" (a), "r" (&v->counter), "m" (v->counter)
        : "cc");	
}

Notice the alternative use of the inline, asm, and volatile keywords as offset by double underscores.

The __GNUC__ macro definition is always predefined for GCC, and can be used to check whether your code is compiling with supported GNU extensions. Think of this as a way to ask the compiler if it is GCC or not. It is recommended that GNU extension keywords accommodate other compilers for portability by checking for the presence of GNU extensions. For example:

#ifndef __GNUC__
#define __asm__ asm
#endif

This will check to see if GNU extensions are available, and, if not, define the keywords that will appear in the code later to something that another compiler will recognize.

When the GNU compiler is invoked with pedantic warning messages, all GNU extensions will be reported as warnings. This can be avoided by using the __extension__ keyword before any expression using a GNU extension keyword.

Keep in mind that this is an example of one GNU extension and its usage. There are dozens more, some of which are similar to non-standard features of other compilers. A prudent development plan includes understanding extensions in terms of their portability and performance implications. A full description of all available extensions is documented in the GCC manual. (See Resources.)

Warning options
GCC allows much granularity in the reporting of error messages. Warning and error options are listed in detail in section 3.8 of the GCC manual (see Resources). In addition to the ability to turn each type of warning on or off, there are a few meta-options to control classes of warning and error messages. These allow variability in how exacting the compiler is.

The -fsyntax-only option instructs the compiler to only be concerned with actual violations of syntax. This is the most lax state for the compiler, as opposed to the -pedantic option.

The -pedantic option issues all warnings demanded by the selected ISO standard. This flag also disallows extensions, except the __keywords__ and expressions preceded by __extension__ identifiers. The -pedantic-errors option will cause the compiler to halt on each warning as though it were an error.

The -Wall option will use many of the warnings, but not others. Which warnings are issued by -Wall is a rule-of-thumb matter. As noted in the GCC manual, this option "enables all the warnings about constructions that some users consider questionable, and that are easy to avoid" (see Resources). The -Wextra option will instruct the compiler to report all the other warning messages considered less questionable. The full list of warnings that are caught by each of these flags is available in section 3.8 of the GCC manual. (See Resources.)

It may be tempting to use the specification of a standard using the -std= option in conjunction with the -pedantic option as a means of conformance testing code. However, this is bad in practice. The -pedantic option will indeed warn about each construction for which the standard demands a warning be issued, but this does not address every condition of the standard, and, therefore, replacement for real conformance testing.

Debugging options
An interactive program debugger can greatly speed the resolution of a problem with code. The GNU Project Debugger, or gdb, can stop at given points in a program, trace the execution of the program, examine what has happened, if the program fails to complete properly, and even change the execution of program.

In order to use gdb with Linux on POWER, the code must be compiled with debugging enabled. The debugger produces information in the DWARF and DWARF2 formats when compiled with the -g or -ggdb options. The -pedantic flag will produce DWARF information for gdb to use, and, unlike most debuggers, gdb can use -g with -O, the basic optimization flag. This is a distinctive feature of gdb not available with many debuggers, since the organization and execution of the code can change considerably after optimization. Be prepared, however, since using -O in conjunction with -g will likely rearrange the code, which will be reflected when using the debugger. GDB can use the more expressive DWARF 2 format, if GCC is passed the -ggdb option. This will enable gdb extensions, as well.

Both the -g and -ggbd options accept level arguments (for example, -ggdb1 for level one). Intuitively, the higher the level, one through three, the more debugging information is made available to the debugger. There is a trade off: With additional debugging information comes a larger and slower executable. A number of gcc flags are used for profiling. These include options to collect data used by the utilities gprof and gcov, as well as information reporting flags for memory usage, optimization details, and more. Many of these options are more commonly used to debug the compiler, rather than the code being compiled. A final set of debugging flags are concerned with the build environment. Options like -print-file-name= are used to verify that the build environment provided is indeed the intended one.

For a full list of debugging flags, see section 3.9 the GCC manual. (See Resources.)

Optimization options
GCC provides a back-end optimizer for C/C++ code and other supported languages. Since Linux on POWER is a high performance platform, it is important for developers to understand the operation of this optimizer. Note that not all optimization routines are documented in detail, only the ones that can be invoked using a command line option. The options described here are not discussed in terms of their performance implications, but in terms of their porting and debugging implications. For a complete list of optimization options, refer to section 3.10 of the GCC manual. (See Resources.)

Grouped optimization flags allow the compiler to select a group of optimization options with one flag. These are used to optimize for either varying levels of performance or for code size. The -O or -O1 flag is a first pass group optimization flag to include optimizations that do not take a great deal of compile time to implement. -O2 implements optimizations expected to help performance without increasing program size, and -O3 selects even more performance optimizations than -O2, regardless of any effect on program size. -Os optimizes for program size, even at the expense of performance. Note that there is no group option for selecting all performance optimizations available to the compiler. It is also worth mentioning that -O is the same as -O1 with GCC. This is in contrast to some other compilers, including the IBM XL C/C++ compiler, which is also available for Linux on POWER.

Be aware that some optimization options are used judiciously by the compiler, as they may impact other aspects of development, such as debugging. One such option is -fomit-frame-pointer, enabled at all -O levels, which removes the frame pointer where it is not needed. This option can improve performance, but specification of the flag will not guarantee that the optimization is implemented. In the case of -fomit-frame-pointer, this is because the stack pointer on POWER acts as a frame pointer, except when memory is dynamically allocated on the stack with a mechanism, such as alloca(). In these cases, a frame pointer must be preserved to provide back traces. This is an example of the optimization heuristic being detrimental, and thus ignored by the compiler.

The POWER architecture features a branch counter register, and GCC has an option to take advantage of this feature, -fbranch-count-reg. This feature uses the architecture's ability to decrement and branch on the count register, instead of decrementing, comparing to zero, and conditionally branching. Those porting from SPARC to Linux on POWER will want to ensure that this flag is enabled, because SPARC does not support this feature. The -fbranch-count-reg flag is enabled by default, if -O2 is enabled. Again, there are contexts for which the compiler does not apply this optimization. In this case, loops without function calls will benefit from the option, but the overhead of copying the count register before a function call outweighs the benefit of using the count register operation.

Preprocessing options
The GNU preprocessor works with the compiler to parse source files by converting text to lexical instructions for the compiler. The GNU preprocessor operates automatically under the control of the GCC driver, and there are several flags available to the GCC driver to direct the behavior of the preprocessor.

As for other toolchain utilities driven by GCC, the preprocessor can use the -Wp option to pass preordained options to the preprocessor or the -Xpreprocessor option to pass options verbatim. Linux on POWER has no preprocessor options that are not available by use of a GCC driver flag. That is, there is no reason to use the -Xpreprocessor flag instead of the -Wp flag.

The -undef option undefines any GCC or target specific macros. This may be useful when planning a port because it can aid in identifying porting issues.

The -M option is likewise useful because it can be used to display all of the dependencies required by the source files. This list of dependencies can then be checked for availability on the source and target platforms.

The -I option is used to include directories to search for includes. If a dependency is specified directly with the -I option, and the include is not successful, make sure the system has had all updates and service packs applied. On older versions of Linux on POWER, defects in the installer left some system libraries and headers incompletely installed. Specifically, Red Hat Enterprise Linux, Version 3, exhibits this problem with 64-bit standard and compatibility libraries. The RHEL3 problem was corrected in RHEL3, Update 3.

Assembler options
Like the preprocessor, the GNU assembler can accept both options that GCC can and cannot recognize: the -Wa and -Xassembler flags, respectively.

These options are covered in more depth under "The GNU assembler."

Linker options
Options similar to -Wa and -Xassembler are available for the linker: -Wl and -Xlinker. However, unlike the assembler options, the GCC driver can accept options that directly affect the linker behavior.

Linux on POWER has two sets of compilers: GCC and XL C/C++ for Linux on POWER. Because both of these compilers use the GNU toolchain, their objects are completely compatible. The listing of an object filename is taken as an argument to the GCC driver, which tells the linker to include this object file in the linking of the executable. This option is very valuable for linking objects produced by separate compilers into the same executable. For example, consider an application with process intensive components and some components with portability concerns. The performance critical pieces could be compiled with XL C/C++ for better performance, and the remainder of the application could be compiled with GCC. This method leverages the performance of one compiler, the flexibility of the GNU toolchain, and the ability to combine objects compiled with these two compilers.

The GCC driver also accepts arguments to specify included libraries when linking. The -l option accepts the library name both immediately or as a separate argument for POSIX compliance. The ordering rules for this argument are important. Libraries are searched in the order in which they are listed. So ensure that libraries on which others are dependent are listed first.

Though it is, in theory, possible to link 32- and 64-bit objects together, this is not currently done by the GNU toolchain. Architecturally, there are reasons why this should never be done. Performance on POWER is not likely to suffer by compiling the entire application as 64-bit.

Code generation options
Some machine independent options affect the way code is generated and have specific implications for Linux on POWER. One common change when porting from Linux on Intel to Linux on POWER comes in accounting for position independent code. The -fpic option is used to generate position independent code for use in a shared library with GCC. On many platforms, this must be specified, but on POWER architecture, it depends upon whether the application is built in 32-bit or 64-bit mode. In 64-bit mode, all objects are position independent; in 32-bit mode, you must specify the -fpic option. Makefiles and build scripts should take this into account when configuring compile flags for Linux on POWER.

Stack backtraces can be used to investigate how a program is executing. If the program is failing, backtraces can provide insight into which function calls are responsible for the failure. Even if the program is executing successfully, backtraces may be used to explore whether the program is executing the way the programmer envisioned. Backtraces rely on frame pointers existing in the objects comprising the program.

Neither the ABI for Linux on POWER 32-bit applications nor the ABI for Linux on POWER 64-bit applications requires the generation of frame pointers for objects. This makes debugging, especially establishing a backtrace, complicated. Frame information can be generated by GCC with the -fexceptions flag. This option will tell GCC to implement the backtrace information required to unwind the stack with a debugger. This will increase data size overhead, but should not affect execution. The -fexceptions flag is enabled by default for C++ programs and disabled for C programs (which may need to interact with exception handlers in C++ objects). So, if C code is expected to handle exceptions thrown by C++ applications, the -fexceptions flag should be added to the C compile flags.

On some architectures, exception handling must be managed with Call Frame Information directives. These directives are used in the assembly to direct exception handling. These directives are available on Linux on POWER, if, for any reason (portability of the code base, for example), the GCC generated exception handling information is not sufficient.

POWER-specific options
While the GNU toolchain does place an emphasis on portability, each architecture has its particular strengths, and developers must understand how to leverage them to extract maximal performance. This section reviews architecture specific options accepted by GCC for Linux on POWER.

Power Architecture families and GCC

POWER and PowerPC processors have traveled a long road from the IBM 801 to the POWER5 and have been implemented pervasively from wristwatches to enterprise servers. POWER is the name of the original architecture and PowerPC originated from the Apple, IBM, Motorola alliance in the 1990s. The two families have developed with separate, yet connected visions since, with the POWER5 and PowerPC970FX flaunting the results of more than a decade of industry leading technology. Despite deployment in an amazing range of products, the common instructions between the two architectural families provide for the interoperability of most code. Though IBM continues to brand recent chips as POWER architecture, PowerPC subsumes all recent processors with respect to GCC.

Architecture flags
Two classes of extended instructions are available to GCC for POWER architecture. The first set, which was intended for early RS/6000 architecture, is enabled with the -mpower flag. This flag is not for use with recent POWER or PowerPC hardware. Instead, use the -mpowerpc option, or its 64-bit sibling, -mpowerpc64, to use instructions common to modern POWER and PowerPC hardware. For developers who prefer to retain support for legacy POWER hardware, both or neither of the -mpower or -mpowerpc flags can be used at the same time because they only enable the extensions for each processor family. If neither of these two flags is used, then only the instructions common to both architectures are used. However, for optimal performance, it is recommended that CPU-specific flags be used.

CPU specific architecture flags
CPU specific optimization flags can result in higher performance than processor family flags. These flags will instruct the compiler to make the most optimal code for a specific CPU, but the code may not run at all on other models. The -mtune= flag is used to specify the scheduling parameters for a given CPU type, but it does not set the architecture type, register usage, or mnemonic variants. These are controlled by the -mcpu= flag.

The -mtune= flag is used like this:

$ gcc -O3 -mtune=power5 -o foo foo.c foo2.c

This example directs the compiler to compile the source files foo.c and foo2.c into a single executable with level three optimization and with scheduling parameters customized for a POWER5 CPU.

The -mcpu= flag is used like this:

$ gcc -O3 -mcpu=power5 -o foo foo.c foo2.c

This example directs the compiler to compile the source files foo.c and foo2.c into an single executable with level three optimization, scheduling parameters, mnemonics, architecture type, and register usage customized for a POWER5 CPU.

Developers familiar with IBM XL C/C++ will notice a difference here. With XL C/C++, the homologous flags are -qtune= and -qarch=, for -mtune= and -mcpu=, respectively.

Both -mtune= and -mcpu= can be used with the common option, which selects a generic instruction set that runs on any Power or PowerPC architecture processor. For example:

$ gcc -O3 -mtune=common -mcpu=common -o foo foo.c foo2.c

This example will produce a level three optimized binary from source files foo.c and foo2.c and will run on any POWER or PowerPC processor using only the instructions common to both architecture sets. However, the binary may not perform as well as one customized to CPU specific flags. Additionally, the powerpc, powerpc64, and power options can be used to specify their respective sets of CPU attributes.

The full range of options for these two flags includes support for older and newer machines. However, only the common power3, power4, power5, 970, powerpc, and powerpc64 options are supported by enterprise Linux distributions for POWER architecture because these are the relevant instruction sets for enterprise hardware. Refer to the GCC manual for the complete list of support instruction sets and processors. (See Resources.)

Vector Multimedia eXtension
Vector Multimedia eXtension, or VMX, is the Single Instruction Multiple Data (SIMD) architecture extension of POWER processors. VMX was developed by the Apple, IBM, Motorola PowerPC alliance, and each member markets this technology under a different name. IBM refers to it as VMX, which is the original code name of the technology, while Apple prefers Velocity Engine®, and Motorola uses the name AltiVec®. GCC does not recognize the trade names for the technology, but it has implemented support for the additional processor instructions, using the name AltiVec for its flag names.

VMX provides 32 additional 128-bit registers to hold vector data, capable of providing sixteen 8-bit values, eight 16-bit values, or four 32-bit values. (Note that VMX registers cannot operate on 64-bit values.) These extra wide registers and the additional 162 processor instructions to operate them are why VMX can provide a significant performance advantage when working with certain types of data. Vectors can be used, for example, to process four ints in parallel, provided that the code is written for this ordering.

VMX can provide a tremendous benefit for some algorithms, but the code must be written for vectorization. Unfortunately, there is no magic "autovectorization" back end for the compiler. There are C intrinsics to provide control down to the level of register selection. These intrinsics, prototypes, and macros are defined in the GNU altivec.h file and are available to help with this task. Note that with IBM XL C/C++, altivec.h does not have to be explicitly included in the source, whereas GCC does require an explicit include. GCC built-in functions for AltiVec can be reviewed in section 5.4 of the GCC manual. (See Resources.)

Vectorization often requires a novel approach to an algorithm, and only the developer can do this for the code with careful planning. In fact, poorly written vector code is often slower than normal code because of inefficient alignment. Data processed in VMX registers must be quad-word aligned (128 bits), and it is up to the developer to ensure that data buffers processed by these instructions are aligned to appropriate addresses. Pay particular attention to buffers created through dynamic memory allocation in 32 bit mode, as they are only guaranteed to be double-word aligned. In 64-bit, however, the malloc subsystem returns quad-word aligned addresses.

A very thorough presentation on vectorizing code has been made available online: "Introduction to Altivec - Ten Easy Ways to Vectorize Your Code" (PDF). (See Resources.) This presentation discusses the significant advantage vectorization can bring to loop unrolling and pixel operations, among other things.

VMX capacity is enabled in GCC with the -maltivec and -mabi=altivec flags.

It is worth mentioning that GCC for Apple's Mac OS X operating system also supports the implementation of their AltiVec extensions, but with subtle differences. One such difference is the syntax in vector declarations, which could impact a code migration between these platforms. Vector literals are placed within braces { } with GCC for Linux on POWER, but in parentheses ( ) with GCC for OS X. IBM XL C/C++ supports both of these methods. For more information on this and other differences between the implementations of VMX, refer to the About Compilers with VMX Support Web page. (See Resources.)

Table 3. Vector literals
VMX Vector SyntaxGCC for LinuxIBM XLC C/C++GCC for Mac OS X
Brackets {...}SupportedSupportedNot supported
Parenthesis (...) Not supportedSupportedSupported

Alignment options
Data alignment options are available with the GNU toolchain for Linux on POWER to accommodate the wide range of POWER- and PowerPC-based implementations (for example, embedded processors), as well as interoperability with other architectures. The default alignment scheme is optimized for 64-bit POWER architecture, but a natural alignment option is also available. Natural alignment forgoes the advantages of long double alignment on the platform and is, therefore, the most efficient memory alignment. However, natural alignment may be required for interoperability with compiled objects on other platforms. The -malign-power flag specifies optimal alignment for POWER architecture and is the default for GCC. The ?malign-natural flag is used to specify natural alignment.

TOC options
POWER processors utilize, in executable modules, a concept called the Table of Contents, or TOC. The TOC is a per-module anchor that acts as a reference point for finding global data, static data, and other position independent code within that module. Each module (the main executable or a shared library, for example) contains its own TOC. This system-specific feature is defined by the 64-bit PowerPC ELF ABI to be accessible using the r2 register. The GCC driver accepts options to control the behavior of the linker when constructing the TOC entries. These options are discussed below, under "The GNU linker."


The GNU Binutils

The GNU binutils include a host of utilities for manufacturing and working with binaries. The two most critical binutils are the GNU linker, ld, and the GNU assembler, as. These two are integral parts of the GNU toolchain and are typically driven by the GCC front-end. However, it is useful to know how to direct these components of the GNU toolchain directly. This section addresses how to control these and select other binutils as it pertains to porting, POWER specific functionality, and common pitfalls or misunderstandings.

The GNU linker

Linking is the last step in creating an executable. The GNU linker executable, or link-editor, is called ld, and its role is to combine object files into executables, while specifying how the program will be executed at runtime. The GNU linker uses a command language script to control the linking process; by default ld is controlled by an internal set of commands, which can be extended or overridden. The emphasis on portability and flexibility is evident in GCC's ability to generate linker scripts for many different compile environments and pass the customized linker script to ld without manual intervention. In contrast, link-editors in some other operating systems approach this differently. The AIX linker, for example, handles customization itself, not in the pre-linking compile step.

This section describes the interaction of the GCC driver with the linker, provides some details of the 64-bit PowerPC ELF ABI, and explores some exceptions with the Linux on POWER implementation of the linker.

Since the role of the linker is to collect object code and emit an executable module, the compiler driver passes the expected arguments to ld. These include application object files, start-up code (object code that runs when the process starts, but before main() is called), lists of potentially useful archives and dependent shared modules, library search paths (used both at link-time and runtime), and any implementation-specific options required by the platform. ld traverses the command line from left to right and uses this ordering information to determine the manner in which symbols are found and/or referenced.

Linux on POWER object files
Linux on POWER uses the ELF object file format. This flexible and extensible format nicely accommodates the requirements of the platform, with the addition of several special sections. Table 4 lists and describes these special sections:

Table 4. Special sections
SectionDescription
.glinkContains support for global linkage code. Function calls between modules (the main executable and libc.so, for example) require a function descriptor to be loaded, where the descriptor contains the target program address and TOC value for the target module. (More on the TOC below.) This mechanism is implemented by the procedure linkage table (.plt section) and the .glink section.
.tocThis is part of the TOC, a per-module section that is a dictionary for locating global pieces of information. One of several portions of the TOC, the .toc section contains initialized information.
.tocbssThis functions like a .bss section, but holds an uninitialized data area destined for the TOC.
.gotThe Global Offset Table is stored within the .got section and provides access to data items that have global scope (visible within and outside of a module). Note that while the data may be located in the .toc or .tocbss sections, global data is always addressed through the GOT.
.pltContains support for inter-module calls. Each module (main executable, shared objects) contains its own TOC, and thus its own TOC anchor. The contents of this section are filled in by the dynamic linker and are simply a set (special case) of function descriptors intended to support lazy symbol resolution.

From the "64-bit PowerPC ELF ABI Supplement" (see Resources):

ELF processor-specific supplements normally define a GOT ('Global Offset Table') section used to hold addresses for position independent code. Some ELF processor-specific supplements, including the 32-bit PowerPC Processor Supplement, define a small data section. The same register is sometimes used to address both the GOT and the small data section.
The 64-bit PowerOpen ABI defines a TOC ('Table of Contents') section. The TOC combines the functions of the GOT and the small data section.

You can see that both 32-bit and 64-bit modules provide the same function, but are organized differently. Since 64-bit mode introduces this TOC concept (which is taken from AIX) to ELF, consider some of the following details.

The Table of Contents
As stated above, every 64-bit module contains a TOC. This implies that your average "hello, world!" will be made up of at least two modules: the main program and libc, and will, therefore, comprise two TOCs. Each TOC will have a "well-known" TOC anchor, which is always found in register two, while a process is running; the TOC register value changes as execution jumps from module to module. This anchor point supports the mechanism of accessing various datum global to a module (such as global externs, global statics, and function descriptors).

Using the TOC implies double-indirect addressing. For example, to access a global variable, your program will use the TOC anchor (r2) to find the location of a pointer to the variable. The same operation occurs when finding a function descriptor for an out-of-module call. But it is possible to store data within the TOC (rather than a pointer to data) and thus avoid the additional layer of indirection. Keep in mind that every datum stored in the TOC must be eight bytes or less (in 64-bit mode), whether it?s a pointer or the datum proper. Now consider the impact this has on the size of the TOC.

TOC-relative addressing uses an instruction that is limited to 16-bit offset values. The TOC will hold 65,536 bytes, which turns out to be enough for 8,192 GOT entries in 64-bit mode. You can see that this might not be enough space for large applications. The GNU linker has several options for dealing with applications that exceed the maximum TOC size. Starting in version 2.15.90 of the GNU Binutils, TOC overflows are automatically split into multiple TOCs at link time, but the following options listed in Table 5 can be used for a specific result. While discussed here, these would commonly be specified as arguments to the GCC driver.

Table 5. TOC options
TOC optionDescription
-mfull-tocThis option is the default for dealing with the TOC. It causes the linker to allocate one TOC for an executable or shared object (a module, in other words). If the available 64K space is inadequate to accomplish the link-edit, the linker will issue an error message that the TOC space has been overflowed.
-mno-fp-in-tocThis flag can reduce the space used in the TOC. Ordinarily the compilers will place floating point values directly within the TOC. (AIX/GCC developers will likely recognize this option from their 32-bit development experiences.) This flag prevents the storage of these values in the small data section, making TOC space available for other entries. A related flag is no-sum-in-toc.
-mno-sum-in-tocThis flag instructs GCC to generate code to calculate the sum of an address and a constant at run-time, instead of putting that sum into the TOC. This option (along with the no-fp-in-toc option) can be used to conserve TOC space at the expense of producing larger and slightly slower code.
-mminimal-tocIf the no-fp-in-toc and no-sum-in-toc flags cannot free enough TOC space, then the minimal-toc flag can be used to generate a separate TOC for every (object) file. This will generate many very small TOC entries. While this solves the problem of TOC overflow (because now there is an infinite number of TOCs, each up to 64K in size), it will produce larger, slower code. The advantage of using the TOC to specify the addresses of functions is essentially forsaken.

Longcall

The POWER instruction set provides the bl, or branch and link, instruction for making intra-module subroutine calls. The form of this instruction allows for a relative address up to 64MB, or 2^26, away from the calling location. There are times, however, when this limit is reached and it is necessary to use an alternative mechanism for getting from point A to point B. The ?mlongcall option causes the function pointer mechanism (which is used for inter-module calls) to be used. In other words, every function call looks like an out-of-module call. This circumvents the 64M limit at the cost of a slight increase in overhead for every function call. A longcall pragma is also available, which takes precedence over the -mlongcall option. longcall (1) applies the attribute to all subsequent function declarations; longcall (0) prevents the attribute from being applied to subsequent functions.

Fortunately for you, the developer, the GNU linker for Linux on POWER can generate, on the fly, the code needed to implement this work-around. Just like the AIX linker, there is no need to worry about using ?mlongcall when working in 64-bit mode. This feature is not available for the 32-bit GNU linker included with SLES9 or RHEL4. However, it has been included in the freely available GCC source. If you have a32-bit application that makes more than 64MB from the calling location on these distributions, then you must either rework your code or compile it as 64-bit. Since 32- and 64-bit applications coexist in the Linux on POWER runtime, it is recommended that you recompile to 64-bit in this occasion.

Linker Scripts

The GNU linker provides a command language that can be used to control link-edit operations. While this will likely come as no surprise to those already familiar with the GCC development tools, AIX developers should understand the difference in behavior between AIX and Linux on POWER. Whereas the (XCOFF) object file definition and AIX linker behavior are rather automatic, the GNU ld allows for a greater degree of flexibility in how and where object file sections are combined. Let?s consider some of the basic attributes of scripts.

GNU ld operates automatically based upon an internal script. You can add to or replace this set of internal commands. Under certain conditions (using specific commands that can only appear once, for example) it is necessary to provide a complete set of commands by using a custom linker script; otherwise, scripts are used to add to the normal operations of the linker. When adding customized operations, you can simply name the linker script on your command line (the linker presumes that anything not recognized as an object file is a linker script). When replacing the default (internal) commands of the linker, you will use either the --T or --script= option.

Those interested in examining the default linker script can cause it to be sent to stderr through a linker option. By using a simple "hello, world" program with the verbose option, you can capture the internal commands:

$ cc -o hello hello.c -Wl,--verbose 2> hello.ldscript

This creates a 256 line script that is capable of handling most common linking situations (as would be expected). AIX developers should note that this is the corollary to the ?bbindcmds: option, although the AIX linker, as already noted, is more automatic.

So why, you ask, would one be interested in this function? Some development efforts require "linker tricks," which is another way of saying that the specifics of the project creates constraints upon how an application is built and/or laid out in memory. These requirements may be addressed by clever link-edit operations, and the GNU linker?s command language provides the mechanism.

The GNU Assembler

The job of the assembler is to take input files written in a human-readable form and produce files containing machine-level instructions. The object code in one or more files is then fed to the linker, which builds executable modules. The GNU assembler supports many platforms; many command line options are applicable to every platform, but options also exist to support system-specific features.

As on other platforms, the GCC driver accepts input from files written in high-level languages -- it emits assembler code, which is then turned into object code by the GNU assembler. Fortunately, most developers have little need to work directly in assembly language, and the GCC driver insulates the developer from specific implementation details. Should the need arise, though, it can be useful to understand how to interact with the assembler; GCC supports this task by accepting comma-delimited options for the assembler using the -Wa [option] syntax. For example:

$ gcc -O -Wa,-Z,-v foo.c -o foo

This command uses the GCC driver to compile file foo.c into the binary foo and print the assembler version, even if errors are encountered.

While you will find that the use of GCC, the linker, the assembler, and so on, for Linux on POWER is very familiar, the assembly language itself may seem a bit different. Note to the detail-oriented folks: RISC architectures usually require more individual instructions to accomplish a specific task, relative to CISC. POWER is no exception. If you examine the assembler code generated by GCC, you will note a larger number of loads and stores, along with other classes of instructions. One useful technique for becoming familiar with assembly language is to write your code in a high-level language, such as C, and either use GCC?s -S option to save the assembler file for you, or use the utility objdump to disassemble the object code.

Organization of the binary
Previously, in the section on ld, "The GNU linker," some of the ELF object file sections were mentioned; those sections were of primary interest to the linker and pertained to executable modules. The sections of most interest, however, will be seen in almost every object file emitted by the compiler, and they organize, as explained in Table 6, the pieces of your program (in object code format), so that the linker knows how to combine them to create a program or shared module.

Table 6. Organization of the binary
SectionDescription
.textExecutable instructions (what your assembly code turns into) are usually stored in this section. Also read-only constants (for example, string constants) will be found here. The text section on this platform is read-only, which means that the program may not alter itself; this attribute allows the text regions of executables to be shared amongst all instances of the running program. For example, every running copy of the bash shell would share a single copy of the bash file?s .text region.
.dataThis is where initialized data for the executable is kept. By definition, the data region is read-write, and the process has complete control over how the content of this region (in memory) is manipulated. By way of example, global variables in C that are initialized with a compile-time value would be found in this section.
.bssThe bss section contains uninitialized data; this section doesn?t require any space in the object file, but does define a region of memory that is zeroed out at runtime. Global variables in C that are not given an initialized value are defined to contain zero at program start time; these variables would be found in the bss section.

The link-editor collects all .text sections from specified object files and combine them to create the executable program code, all .data sections are collected to create the final data section, and all .bss sections are combined to create the final bss section. Note, also, that every executable module contains its own set of sections; a shared module will have a separate .text and .data sections, for example.

Assembly version of Hello, World!
Consider this 32-bit, non-PIC assembly version of "Hello, World!" first published by Hollis Blanchard in the article "PowerPC assembly: Introduction to assembly on the PowerPC" (developerWorks, July 2002):

Listing 2. ia32 assembly
.data                  # section declaration - variables only
.align    3		# make double-word aligned
msg:
	.string "Hello, world!\n"
	len = . - msg       # length of our dear string

.text                  # section declaration - begin code
.align    3            # ensure alignment
	.global _start
_start:

# write our string to stdout

	li      0,4         # syscall number (sys_write)
	li      3,1         # first arg: file descriptor (stdout)
	                    # second arg: pointer to message to write
	lis     4,msg@ha    # load top 16 bits of &msg
	addi    4,4,msg@l   # load bottom 16 bits
	li      5,len       # third argument: message length
	sc                  # call kernel

# and exit

	li      0,1         # syscall number (sys_exit)
	li      3,1         # first argument: exit code
	sc                  # call kernel

Note the data section at the top, where the printf string is defined -- it could just as easily be in the .text section, since it is treated as a constant.

The text section is made up of just a few instructions because this program makes system calls (rather than libc routines, such as printf) to accomplish the work. The write system call is specified in register zero, the target file descriptor in register three, the pointer to the string in register four, and the length of the string in register five. After the string is written out, register zero is set up to call the exit function, passing a value of "1" as the exit code.

This example would be passed through the assembler to create an object file, and the object file is passed to the linker, which then builds an executable image. This series of steps will be familiar to developers coming from both AIX and Linux backgrounds.

CFI directives in assembly
If you find yourself in the position of combining C++ with assembly language routines, you can incorporate exception handling support by using the Call Frame Information directives. These directives generate stack unwinding information, such that any routines written in assembler will integrate nicely with C++ and other high-level languages.

The list of CFI directives supported by the GNU assembler is listed on CFI support for GNU assembler (GAS) Web page (see Resources). For Linux on POWER, the CFI directives of interest are start_proc and end_proc. These directives, when placed on each side of a function in the assembly code, will generate .eh_frame information.

More Binutils
In addition to the linker and assembler, some developers will find the remaining binutils very useful. Rather than discuss the function of each tool, which is done in detail at the GNU Binary Utilities Web site (see Resources), the following discussion is limited to two similar tools. Either objdump or readelf may feel more or less natural, depending on the platform you are used to. AIX developers will recognize the style of objdump, and ELF veterans will already be familiar with the readelf tool. Both are available for Linux on POWER.

objdump
AIX developers familiar with the dump command will want to know about objdump. This tool provides the same capability, and more, to which you are accustomed. One useful feature of objdump is the ability to disassemble an object file and examine the machine code. Note, though, that this listing is not suitable for feeding back to the assembler.
readelf
Another helpful tool, readelf, can display symbols, section information, binary file format info, and more. It can be useful in analyzing how the compiler has created the binary from your code, especially with shared objects to which you are linking. AIX developers will find this corresponds to using the dump command to view loader section information.

Summary

Development for Linux on POWER varies subtly from Linux on x86 architecture and from AIX running on POWER architecture. Generally speaking, these differences are not salient to developers, as GNU software caters to portability. However, in some contexts more information is needed, and we have tried to provide this information as it pertains to the GNU development toolchain, including the GCC compiler, the GNU linker and assembler, and other GNU binutils.

We have reviewed the functionality of the GNU Compiler and its ability to not only compiler code, but to also control the other binutils. Details of the internal format and operations adhered to by the linker and assembler were reviewed with emphasis for an audience familiar with other architectures and operating systems. Additional information on these topics has been referred to throughout the document and is also listed under Resources. We have referenced the most recent GCC manual to date for this version of the document. It is possible that a more recent version is available as you read this. Check the references for the most recent version of the GCC manual.


Acknowledgments

We would like to thank Steven Munroe, Alan Modra, Hollis Blanchard, and David Edelsohn, as well as all of the other skilled developers that contributed to our understanding of this topic. Special thanks to Hollis for the use of his Hello, World example.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=82573
ArticleTitle=GNU C/C++ toolchain for Linux on POWER
publish-date=05042005