advtool-usage

Using Advance Toolchain for Linux on Power

Notes and tips for using Advance Toolchain for Linux on Power

IBM Advance Toolchain for Linux on Power 16.0-5 is now available! Learn more

Basic

After the installation steps are done, the Advance Toolchain is ready for usage. Call the program directly, for example /opt/atX.X/bin/gcc.

However, some applications have complex build systems (for example: autotool, make, cmake) in which is necessary to correctly set the environment PATH, for example:

PATH=/opt/atX.X/bin:/opt/atX.X/sbin:$PATH make

Alternatively you can use Environment modules.

On cmake build systems, it is also necessary to set the CMAKE_PREFIX_PATH with AT's path, for example:

CMAKE_PREFIX_PATH=/opt/atX.X/

Package descriptions

In most cases, you do not need to install all of the packages that are provided with the Advance Toolchain. The following list describes when they are needed:

  • advance-toolchain-atX.X-runtime

    Provides base functionality to run Advance Toolchain applications. This package is always required.

  • advance-toolchain-atX.X-runtime-compat

    Substitutes the runtime package on previous distributions versions. Meaning, get the Advance Toolchain 11.0 runtime libraries on a RHEL6 machine. There are no optimized libraries for the runtime compatibility package.

  • advance-toolchain-atX.X-devel

    Provides development tools. This package is only needed to develop applications.

  • advance-toolchain-atX.X-perf

    Provides tools for measuring performance. It's only useful on some development environments.

  • advance-toolchain-atX.X-mcore-libs

    Provides libraries for multi-thread development, like Boost, SPHDE, and Threading Building Blocks. This package is also required on servers running the applications developed with those libraries.

  • advance-toolchain-atX.X-runtime-atZZ-compat

    Install this package only if you need to run an application built with the previous version of the Advance Toolchain on top of the current version. More information on Runtime Compatibility Between Advance Toolchain Versions

  • advance-toolchain-atX.X-selinux

    Provides SELinux settings. This package is required to develop or run Advance Toolchain applications on a SELinux enabled environment.

  • advance-toolchain-atX.X-cross-ppc64le

    Provide a cross compiler for little endian (ppc64le). These packages are available for x86 (i386) or x86-64 (amd64) in order to generate binaries for POWER.

  • advance-toolchain-atX.X-cross-common

    Provides files common to cross compiler packages. This package is mandatory for cross compiler installation starting from version 8.0.

  • advance-toolchain-atX.X-cross-ppc64le-runtime-extras

    Provides extra libraries to the cross compiler packages.

  • advance-toolchain-atX.X-cross-ppc64le-mcore-libs

    Provides the libraries for multi-thread development to the cross compiler.

  • advance-toolchain-atX.X-<package_name>-debuginfo or advance-toolchain-atX.X-<package_name>-dbg

    Provides the .debug files that contain the DWARF debuginfo for the files in <package_name>, those files are useful to debug and profile the applications built with AT.

Runtime compatibility between Advance Toolchain versions

If you are running applications built with an older version of the Advance Toolchain, install the compatibility rpm advance-toolchain-atX.X-runtime-atZ.Z-compat-X.X-X in order to run these applications on top of a newer version of the Advance Toolchain. For example, by using AT 10.0 to run applications built with AT 9.0:

Install the runtime package:

rpm advance-toolchain-at10.0-runtime-at9.0-compat-10.0-0

Then, run:

/etc/rc.d/init.d/at10.0-runtime-at9.0-compat start

Manual pages

In order for the system man application to pick up Advance Toolchain installed manual pages, you must export the location of the AT manual pages in the MANPATH variable prior to invoking man, as described with the following commands:

unset MANPATH
export MANPATH="/opt/atX.X/share/man:`manpath`"
man <topic>

Or you might override the current environment MANPATH as demonstrated in the following example:

MANPATH="/opt/atX.X/share/man:`manpath`" man lsauxv

Optimization selection

Directing gcc to build an application for a particular CPU can take advantage of processor-specific instruction selection. In some cases, it can significantly improve performance. Building without selecting a particular CPU simply causes gcc to select the default (lowest common denominator) instruction set.

  • -mcpu=power6
  • -mcpu=power6x
  • -mcpu=power7
  • -mcpu=power8
  • -mcpu=power9

Notes:

  • On Advance Toolchain 15.0 and 16.0 the compiler defaults to -mcpu=power8 -mtune=power10
  • On Advance Toolchain 10.0, 11.0, 12.0, 13.0, and 14.0 the compiler defaults to -mcpu=power8 -mtune=power9
  • On Advance Toolchain 7.x, 8.0, and 9.0, the compiler defaults to -mcpu=power7 -mtune=power8
  • On Advance Toolchain 6.0, the compiler defaults to -mcpu=power6 -mtune=power7
  • When you are using -mcpu=power7, DO NOT disable Altivec (for example, -mno-altivec) without also disabling VSX (for example, -mno-vsx). The following combination is invalid:
    -mcpu=power7 -mno-altivec
    

Common GCC options

  • -fpeel-loops

    Peels the loops so that there is enough information that they do not roll much (from profile feedback). This value also turns on complete loop peeling; complete removal of loops with small constant number of iterations.

  • -funroll-loops

    Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. The -funroll-loops value implies the use of the -frerun-cse-after-loop value. This option makes the code larger, and might or not make it run faster.

  • -ftree-vectorize

    Perform loop vectorization on trees. This flag is enabled by default at -O3, starting at the GCC 4.3 time frame.

  • -ffast-math

    Sets the -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, and -fcx-limited-range values. This option causes the preprocessor macro __FAST_MATH__ to be defined. This option is not turned on by any -O option because it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules or specifications for math functions. However, it might yield faster code for programs that do not require the guarantees of these specifications.

Others GCC options for older Linux distributions

The Advance Toolchain shows significant gains over the older (SLES10 SP3) Enterprise Linux Distributions because these releases use an older GCC 4.1 compiler that is not enabled for POWER7. In addition, these releases are restricted to POWER6 compatibility mode and cannot leverage the new instructions enabled for POWER7 for applications or runtime libraries. However, there are some best practices that you can use to maximize your performance.

Best practices for the -mcpu and -mtune values

  • If your program will be running on the same POWER systems for the foreseeable future, built with the matching -mcpu value.
  • If your program needs to run on multiple systems, the best strategy is to build with the -mcpu set for the oldest supported system. If your program mostly runs on a newer system with the older system for accommodation or backup, then use the -mcpu set for the oldest system combined with the -mtune value for system that you want to optimize for performance.
  • If your program is running in POWER6 compatibility mode on a POWER7 system, use the -mcpu=power6 and -mtune=power7 values.
  • If your code is organized in dynamic libraries, you can compile and build your libraries multiple times by using the -mcpu value for the specific Power platform and then install those libraries into the matching /lib64/power directory. Then the dynamic linker is able to automatically select the dynamic library optimized for the specific system.

Best practice for large programs

  • Avoid compiling with the -mminimal-toc value as this option adds extra levels of indirection for static data accesses.
  • Compile with the -mcmodel=medium value to optimize static data access and allow the linker to perform extra optimizations on the final program or library image.
  • For programs with static data exceeding 2 GB, you might need to use the -mcmodel=large value.

Platform and hardware capabilities determination

Starting with Advance Toolchain 9.0, glibc includes the function getauxval that can be used at runtime to query the capabilities of the hardware. If you are using Advanced Toolchain 9.0 or newer, use getauxval instead of libauxv. For more information about getauxval, see the manual page exists that documents this function. The function libauxv might be removed in a future major release of the Advanced Toolchain.

The following example tests whether the runtime CPU has POWER7 Vector extensions:

#include <sys/auxv.h>
#include <stdbool.h>
#include <stdio.h>

bool
has_vector(void)
{
   unsigned long int hwcap_mask = (unsigned long int) getauxval (AT_HWCAP);
   bool has_vec = (hwcap_mask & PPC_FEATURE_HAS_VSX) != 0;
   if (has_vec)
       printf("CPU has POWER7 vector extensions\n");
   return has_vec;
}

Advance Toolchain versions 10.0 and previous versions include libauxv and lsauxv, a system library and application, respectively, that provide a mechanism for querying the system platform information from the kernel's auxiliary vector. The system hardware capabilities (hwcap) might be queried through the auxiliary vector as well. For example, the platform can be queried dynamically with the following function:

char * platform = (char *) query_auxv (AT_PLATFORM);

Information from the hwcap can be queried in the following manner:

unsigned long int hwcap_mask = (unsigned long int) query_auxv (AT_HWCAP);
if (hwcap_mask & PPC_FEATURE_HAS_FPU)
   printf(" HAS_FPU\n");

More information about using libauxv and lsauxv can be found in the auxv and lsauxv manual pages provided by the Advance Toolchain (see Manual Pages).

Relinking a pre-built application with the Advance Toolchain

Locate all of the application's .o files. You can also link .a files to pick them all up at once. These files are needed for the relink.

Locate the paths to all of the necessary linked shared-object files, for example:

/usr/X11R6/lib for libXrender
/opt/gnome/lib for libgtk-x11-2.0

Edit /opt/atX.X/etc/ld.so.conf and add the directories to all of the shared object files to the end of this file. If applicable, don't forget lib64 for the 64-bit equivalent libraries, for example:

/opt/gnome/lib/
/opt/gnome/lib64/
/usr/X11R6/lib
/usr/X11R6/lib64/

Run the Advance Toolchain ldconfig application to regenerate /opt/atX.X/etc/ld.so.cache, for example:

sudo /opt/atX.X/sbin/ldconfig

The loader uses /opt/atX.X/etc/ld.so.cache to find the libraries the application was linked against.

Relink by using the Advance Toolchain's compiler:

/opt/atX.X/bin/gcc -g -O2 -o <application_name> <list_of_dot_o_files> \
   <list_of_dot_a_files> -L<path_to_libraries> \
   -l<one_for_each_library_needed_for_the_link>

A real life example:

/opt/at5.0/bin/gcc -g -O2 -o mandelbrot callbacks.o interface.o \
   main.o quadmand.o support.o mandel_internals.a \
   -L/usr/X11R6/lib -L/usr/X11R6/lib64 -L/opt/gnome/lib -lgtk-x11-2.0
   -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 \
   -lpangocairo-1.0 -lpango-1.0 -lcairo -lgobject-2.0 -lgmodule-2.0 -ldl \
   -lglib-2.0 -lfreetype -lfontconfig \
   -lXrender -lX11 -lXext -lpng12 -lz -lglitz -lm -lstdc++ -lpthread \
   -lgthread-2.0

If ld gives an error like the following, then you're missing the path to that library in the link stage. Add it with -L<path to library>, for example:

/opt/at5.0/bin/ld: cannot find -lgtk-x11-2.0

Add -L/opt/gnome/lib/ to the gnome compilation line. You need to tell the linker where to find all of the libraries.

When you are running the relinked application, you might get an error like the following:

./mandelbrot: error while loading shared libraries: libglib-2.0.so.0: cannot open
shared object file: No such file or directory.

Then, you need to add the path to the library in question to /opt/atX.X/etc/ld.so.conf and re-run /opt/atX.X/sbin/ldconfig. The Advance Toolchain loader needs to know where to find the libraries and uses the generated /opt/atX.X/etc/ld.so.cache to find them.

You can verify that the Advance Toolchain libraries were picked up by running the application prefaced with LD_DEBUG=libs, for example:

LD_DEBUG=all ./mandelbrot

Caution: if your applications are not relinked with the Advance Toolchain, do NOT use LD_LIBRARY_PATH to point to the Advance Toolchain libraries. Doing so can result in ld.so and libc.so version mismatch and cause runtime failures.

Library search paths

The file /opt/atX.X/etc/ld.so.conf already includes /etc/ld.so.conf in the search order, but you might need to re-run /opt/atX.X/sbin/ldconfig in order to populate /opt/atX.X/etc/ld.so.cache with the specialized search paths you have added to /etc/ld.so.conf after an Advance Toolchain installation.

If you are running ldd against your binary and it is showing that some libraries are not found, you might need to re-run /opt/atX.X/sbin/ldconfig to fix that. Although Advance Toolchain has a daemon (atxx-x-cachemanager.service) that monitors the system ld.so.cache and updates /opt/atX.X/etc/ld.so.cache when needed.

The environment variable LD_LIBRARY_PATH is a colon-separated set of directories where libraries are searched first before the standard set of directories. This list is used in preference to any runtime or default system linker path.

As the compiled-in path (see "man ld" for "rpath") is reliable, it is not recommended to use the LD_LIBRARY_PATH. However, if you have to use it (for example, your build system uses LD_LIBRARY_PATH or has it as part of a wrapper), be sure that a system directory never appears before the Advance Toolchain directory.

Example:

LD_LIBRARY_PATH=/opt/atX.X/lib64:/lib64:$LD_LIBRARY_PATH

Lock Elision support in glibc

Transactional Lock Elision (TLE) is a technique, implemented on top of the Hardware Transactional Memory, that allows critical sections of code to be speculatively executed by multiple threads, potentially without serializing. In the ideal case, TLE allows multiple threads to execute such a section of code in parallel. When a data race does occur, each thread makes several attempts to elide the lock before it's falling back to traditional locking for a period of time. However, no source modifications are needed to enable this feature, with profiling, source code can be altered to change things such as structure padding and member placement to minimize false sharing conflicts. Such changes are likely beneficial even without TLE.

It should be noted that TLE might not benefit all applications, and the benefits depend largely on how mutexes are used. In essence, TLE lets the hardware track any data races that occur while it's executing a critical section. POWER8 transactional resources are shared among hardware threads per core, thus enabling more threads per core might result in lower performance depending on how much memory is touched within a critical section. TLE is not recommended for applications that make system calls inside the critical section as the kernel does not support system calls from inside a memory transaction.

Advance Toolchain 9.0 and newest versions have support for Transactional Lock Elision (TLE) on glibc (since Advance Toolchain 9.0-3). TLE is disabled by default, a script to enable it is provided as /opt/atX.X/scripts/tle_on.sh. Note, that the internals of how the script works are subject to change at any time.

The following code is a sample program that demonstrates how to use the tle_on.sh script, and how to use it:

#include <stdio.h>
#include <pthread.h>
#include <htmintrin.h>

int main()
{
   pthread_mutex_t t = PTHREAD_MUTEX_INITIALIZER;
   int elided = 0;

   pthread_mutex_lock (&t);
   if (_HTM_STATE (__builtin_tcheck ()) == _HTM_TRANSACTIONAL)
       elided = 1;
   pthread_mutex_unlock (&t);

   if (elided)
       puts ("Hurray! We are elided!");
   else
       puts ("Shucks! We are not elided!");

   return 0;
}

Compiling this program with the following command:

/opt/at9.0/bin/gcc tle.c -mhtm -o tle -O2 -pthread

It can be run as follows:

/opt/at9.0/scripts/tle_on.sh -e yes ./tle

or

/opt/at9.0/scripts/tle_on.sh ./tle

Likewise

/opt/at9.0/scripts/tle_on.sh -h

Prints the help information for the script.

Advance Toolchain and libhugetlbfs

The Advance Toolchain provides its own 32-bit and 64-bit versions of libhugetlbfs 2.x. For more information about using libhugetlbfs 2.0, see the Advance Toolchain's libhugetlbfs man page:

unset MANPATH

export MANPATH="/opt/atX.X/share/man:`manpath`"

man libhugetlbfs

Note: libhugetlbfs 1.0 is deprecated for new users. Use libhugetlbfs 2.0 (or later) that is provided directly by the Advance Toolchain. If you must use libhugetlbfs 1.0, follow these instructions.

The /opt/atX.X/scripts/createldhuge.sh script is provided (until Advance Toolchain 8.0), that copies /opt/atX.X/bin/ld to /opt/atX.X/bin/ld.orig and creates a wrapper script in /opt/atX.X/bin/ld. You need only to run this script if you want the Advance Toolchain to work with libhugetlbfs.

The new /opt/atX.X/bin/ld is a wrapper script that detects whether the --hugetlbfs-link or --hugetlbsf-align switches have been passed to the linker. If so then it sets a script-local LD environment variable to /opt/atX.X/bin/ld.orig and invokes the system's ld.hugetlbfs, for example:

LD="/opt/atX.X/bin/ld.orig"

/usr/share/libhugetlbfs/ld.hugetlbfs *switches*

If it doesn't detect the hugetlbfs-link/hugetlbfs-align switch, then it simply forwards the linker invocation to /opt/atX.X/bin/ld.orig directly.

If libhugetlbfs support is desired, the first thing to do is backup the original Advance Toolchain linker just in case there are problems and you need to restore it manually.

cp -p /opt/atX.X/bin/ld /opt/atX.X/bin/ld.backup

The scripts in /opt/atX.X/scripts/ do the rest of the work for you:

createldhuge.sh restoreld.sh

Invoke createldhuge.sh to create the wrapper ld:

sudo sh createldhuge.sh

/<prefix-to-libhugetlbfs>/share/libhugetlbfs/ld.hugetlbfs

/opt/atX.X

This MUST be executed as sudo (or root) for the ld wrapper script to be created properly.

If or when you want to restore the original Advance Toolchain linker, simply run sudo sh restoreld.sh.

The Advance Toolchain gcc always ignores the -B/<prefix-to-libhugetlbfs>/share/libhugetlbfs directive because it has been built to always invoke /opt/atX.X/bin/ld directly. You can use the gcc invocation you have always used, for example:

/opt/atX.X/bin/gcc temp.c -v -o temp

-B/<prefix-to-libhugetlbfs>/share/libhugetlbfs/

-Wl,--hugetlbfs-link=BDT

Note: If you invoke /opt/atX.X/bin/ld --hugetlbfs-link=BDT directly, you need to supply an -m* flag that is normally provided by gcc directly (see man ld for supported emulations).

Packaging an application built with the Advance Toolchain

Applications built with Advance Toolchain can be packaged with RPM. However, due to a limitation in the way that RPM generates dependency lists, the symbols from the Advance Toolchain might collide with the ones provided by the Linux distribution. There are two ways to prevent such collision:

Manually set your dependencies

Use this method for versions of RPM older than 4.8.

  1. Change your spec file, adding the entry Autoreq: 0 to disable the auto requirements check.
  2. Perform the build of your application as usual and then check every component manually for its dependencies (usually by using ldd to find out shared library dependencies).
  3. Use the shared library list obtained in the previous step and replace the Advance Toolchain entries found by a single advance-toolchain-atX.X-runtime, where X.X is the Advance Toolchain version being used for the build.

    Note: If any libraries from the Advance Toolchain mcore-libs package have being used, their entries must be replaced by a single advance-toolchain-atX.X-mcore-libs entry instead.

  4. Add an entry of Requires: <list-found> on your spec file, replacing the "<list-found>" with the list assembled in the previous step.

Note: These steps are only required for the first build to adjust your spec file for packaging, unless changes in the build system, or other packaging spec defined changes occur (in that case, it must be revised to guarantee its functionality).

Use a script to get personalized require dependency filters

Use this method with RPM version 4.8 or later.

  1. The /opt/atX.X/scripts/find_dependencies.sh script is provided (since Advance Toolchain 9.0-2) to help set the dependencies. Since RPM version 4.8, you can replace the macro that checks for dependencies. This script accepts a list of files as input and provides a list of found dependencies as output. It looks for standard dependencies found inside the Advance Toolchain install path and replaces them by a single output of advance-toolchain-atX.X-runtime, or advance-toolchain-atX.X-mcore-libs if the found dependency is a multi core library provided by Advance Toolchain.
  2. By default, RPM points the %__find_requires macro to its standard script located on /usr/lib/rpm/find_requires to use the find_dependencies.sh script you need to change your spec file, redefining the macro required by using the following entries:
    %define __find_requires
    
    /opt/atx.x/scripts/find_dependencies.sh
    
    </full/path/to/files>
    
    %define _use_internal_dependency_generator 0
    
    Take a look at this link for more information about RPM dependencies.

Advance Toolchain with IBM XLC and XLF

When someone is compiling binaries using XLC or XLF, the user must add the -F <path_to_cfg_file> option to the compiler command line. The Advance Toolchain provides a script that creates those files in /opt/atX.X/scripts. This script is automatically run during installation. If you need to re-run it later (that is, you installed XLC/XLF after the Advance Toolchain installation), execute the following command:

/opt/atX.X/scripts/at-create-ibmcmp-cfg.sh

Notice the absolute path when you are calling the script. DO NOT call it using a relative path. The script creates the config files in /opt/atX.X/scripts.

This procedure does not affect the default XLC/XLF configuration.

MASS libraries and Advance Toolchain

IBM's Mathematical Acceleration Subsystem (MASS) libraries consist of a set of mathematical functions for C, C++, and Fortran-language applications that are tuned for specific POWER architectures. The libraries are available for ppc64le.

To use MASS with Advance Toolchain, you need to pass the option -mveclibabi=mass to GCC.

Debug options for Advance Toolchain

Using its printf command, gdb can be asked to output _Decimal[32|64|128] formatted floating point registers by default.

When you are using objdump to inspect POWER Decimal code, make sure to use the -Mpower7 or -Mpower8 flag. For example:

/opt/atX.X/bin/objdump -d -Mpower8 <your_file>

The same applies to POWER7 code.

Environment modules

Environment Modules is a tool that simplifies shell initialization and lets users easily modify their environment during the session with modulefiles.

Since AT 11.0, modulefiles are provided in the devel and cross-common packages to help easily set the environment for using AT.

To use, simply run the command:

module load atX.X

To stop using, it is also simple:

module unload atX.X

Note: The enviroment-modules package needs to be installed and configured in the system.