Building packages and applications to interface with WML CE

Find information about building packages and applications, including CUDA Toolkit packages and C++ applications that interface with WML CE.

CUDA Toolkit packages

IBM® provides CUDA Toolkit conda packages to accompany WML CE.

One benefit of this is that it is easy (automatic) to ensure that the correct CUDA version for each WML CE release is installed. Another is that it allows different versions to coexist in the same machine (even for the same user), just in different conda environments.

One drawback is that the CUDA Toolkit files are not installed in the standard /usr/local/cuda* directories where some application build scripts may expect to find them.

The CUDA Toolkit is split between two conda packages:

cudatoolkit - includes CUDA runtime support
cudatoolkit-dev - includes the CUDA compiler, headers, etc. needed for application development

By default, a full WML CE installation includes the cudatoolkit runtime package, but not the cudatoolkit-dev development package.

Building applications

We recommend building applications using the Anaconda v7 GCC compiler (see below for compatibility information). Install the CUDA Toolkit development components and Anaconda compiler with:

(my-pai-env) $ conda install cudatoolkit-dev gxx_linux-ppc64le=7  # on Power
(my-pai-env) $ conda install cudatoolkit-dev gxx_linux-64=7       # on x86

The various CUDA Toolkit components are installed in the conda environment at:

$CONDA_PREFIX/bin - CUDA executables: nvcc, cuda-memcheck, cuda-gdb, etc.
$CONDA_PREFIX/lib64 - libraries for runtime and building applications
$CONDA_PREFIX/include - header files for building applications

This is the same directory layout as a standard CUDA Toolkit installation. Setting a CUDA_HOME or CUDA_PATH build variable to $CONDA_PREFIX will allow most applications to find the components correctly.

Using a non-default compiler with nvcc

If you are using the Anaconda C++ compiler (or other non-default C++ compiler) then you must use nvcc's -ccbin flag to tell it where to find the compiler.

If using the Anaconda compiler, then (depending on how your environment was created) the CXX and GXX variables should contain the path to the compiler.

So you may need to ensure that nvcc is invoked as, for example:

nvcc -ccbin $CXX

nvcc -ccbin $CONDA_PREFIX/bin/powerpc64le-conda_cos7-linux-gnu-c++

Some applications may have existing build variables (for example HOST_COMPILER) that can be set to take care of this automatically.

Running applications

The CUDA runtime libraries are also linked into the directory:

CONDA_PREFIX/cuda/lib

That directory is automatically added to LD_LIBRARY_PATH when a WML CE conda environment is activated. This ensures applications needing the CUDA libraries can find them, without interference from other conda-provided libraries.

Build example: CUDA Toolkit samples

Here's an example console session showing how to build the standard CUDA Toolkit Samples in a WML CE conda environment:

$ conda create -y -n my-pai-env python=3.6 powerai

$ conda activate my-pai-env

(my-pai-env) $ conda install -y cudatoolkit-dev gxx_linux-ppc64le=7

(my-pai-env) $ cuda-install-samples-10.1.sh .
Copying samples to ./NVIDIA_CUDA-10.1_Samples now...
Finished copying samples.

(my-pai-env) $ cd NVIDIA_CUDA-10.1_Samples

(my-pai-env) $ export CUDA_PATH=$CONDA_PREFIX

(my-pai-env) $ echo $GXX
/opt/anaconda/envs/my-pai-env/bin/powerpc64le-conda_cos7-linux-gnu-g++

(my-pai-env) $ export HOST_COMPILER=$GXX
(my-pai-env) $ export OMPI_CXX=$GXX

(my-pai-env) $ make
WARNING - GCC variable has been deprecated
WARNING - please use HOST_COMPILER=/opt/anaconda/envs/my-pai-env/bin/powerpc64le-conda_cos7-linux-gnu-gcc instead
make[1]: Entering directory `/home/builder/NVIDIA_CUDA-10.1_Samples/0_Simple/simpleCallback'
...
cp conjugateGradient ../../bin/ppc64le/linux/release
make[1]: Leaving directory `/home/builder/NVIDIA_CUDA-10.1_Samples/7_CUDALibraries/conjugateGradient'
Finished building CUDA samples

(my-pai-env) $ bin/ppc64le/linux/release/deviceQuery
bin/ppc64le/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 4
Result = PASS

Build example: Python application with native code: Horovod

Python applications (possibly including native code) can be built similarly and again may require setting a build variable to provide the CUDA path.

For a full description of how to build Horovod with PowerAI DDL, see Distributed deep learning with Horovod and PowerAI DDL.

Note the steps include installing the Anaconda compiler and cudatoolkit-dev, and then setting HOROVOD_CUDA_HOME to indicate the CUDA path:

(my-pai-env) $ conda install gxx_linux-ppc64le cffi cudatoolkit-dev

(my-pai-env) $ HOROVOD_CUDA_HOME=$CONDA_PREFIX HOROVOD_GPU_ALLREDUCE=DDL pip install horovod --no-cache-dir

C++ compiler compatibility

There is an incompatibility between GCC versions 4 and 5, involving name mangling for functions using some C++11 features, such as the newer version of std::string.

Newer compilers will (by default) tag such functions with an ABI indicator including the string "cxx11". Older compilers lack support for the tag, so will not generate or expect it.

Using the example of a Protobuf function, a newer compiler would generate a reference to:

_ZNK6google8protobuf7Message11GetTypeNameB5cxx11Ev

While an older compiler would instead expect the same function to be named:

_ZNK6google8protobuf7Message11GetTypeNameEv

Objects and libraries built with an older GCC toolchain may be incompatible with those built with a newer GCC toolchain. The typical symptom would be a link or load failure with a complaint of unresolved or undefined symbols, even though the library that should contain the symbols is present and visible. Inspection (for example, with objdump -T ... or similar) shows a near-match of symbols, but with the cxx11 ABI tagging difference as previously shown.

More information can be found at https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html

WML CE is built with a recent Anaconda toolchain (version 7.x), with C++11 support, and generally with the compiler defaults (ABI tagging enabled).

You must use a recent g++ compiler when building any C++ code to interface to WML CE libraries. The default distribution compiler on RHEL 7.6 (g++ 4.8.5) will be incompatible.

We recommend the Anaconda v7 toolchain (conda install gxx_linux-ppc64le=7or conda install gxx_linux-64=7) since that is what is used to build WML CE. However, Red Hat Developer Toolset or the Ubuntu 18.04 default toolchain may also work.