Building packages and applications to interface with WML CE
Find information about building packages and applications, including CUDA Toolkit packages and C++ applications that interface with WML CE.
CUDA Toolkit packages
IBM® provides CUDA Toolkit conda packages to accompany WML CE.
One benefit of this is that it is easy (automatic) to ensure that the correct CUDA version for each WML CE release is installed. Another is that it allows different versions to coexist in the same machine (even for the same user), just in different conda environments.
One drawback is that the CUDA Toolkit files are not installed in the standard
/usr/local/cuda*
directories where some application build scripts may expect to
find them.
The CUDA Toolkit is split between two conda packages:
cudatoolkit
- includes CUDA runtime supportcudatoolkit-dev
- includes the CUDA compiler, headers, etc. needed for application development
By default, a full WML CE installation
includes the cudatoolkit
runtime package, but not the
cudatoolkit-dev
development package.
Building applications
We recommend building applications using the Anaconda v7 GCC compiler (see below for compatibility information). Install the CUDA Toolkit development components and Anaconda compiler with:
(my-pai-env) $ conda install cudatoolkit-dev gxx_linux-ppc64le=7 # on Power
(my-pai-env) $ conda install cudatoolkit-dev gxx_linux-64=7 # on x86
The various CUDA Toolkit components are installed in the conda environment at:
$CONDA_PREFIX/bin
- CUDA executables: nvcc, cuda-memcheck, cuda-gdb, etc.$CONDA_PREFIX/lib64
- libraries for runtime and building applications$CONDA_PREFIX/include
- header files for building applications
This is the same directory layout as a standard CUDA Toolkit installation. Setting a
CUDA_HOME
or CUDA_PATH
build variable to
$CONDA_PREFIX
will allow most applications to find the components correctly.
Using a non-default compiler with nvcc
If you are using the Anaconda C++ compiler (or other non-default C++ compiler) then you must use
nvcc
's -ccbin
flag to tell it where to find the compiler.
If using the Anaconda compiler, then (depending on how your environment was created) the
CXX
and GXX
variables should contain the path to the compiler.
So you may need to ensure that nvcc
is invoked as, for example:
nvcc -ccbin $CXX
or
nvcc -ccbin $CONDA_PREFIX/bin/powerpc64le-conda_cos7-linux-gnu-c++
Some applications may have existing build variables (for example HOST_COMPILER
)
that can be set to take care of this automatically.
Running applications
The CUDA runtime libraries are also linked into the directory:
CONDA_PREFIX/cuda/lib
That directory is automatically added to LD_LIBRARY_PATH
when a WML CE conda environment is activated. This ensures
applications needing the CUDA libraries can find them, without interference from other
conda-provided libraries.
Build example: CUDA Toolkit samples
Here's an example console session showing how to build the standard CUDA Toolkit Samples in a WML CE conda environment:
$ conda create -y -n my-pai-env python=3.6 powerai
$ conda activate my-pai-env
(my-pai-env) $ conda install -y cudatoolkit-dev gxx_linux-ppc64le=7
(my-pai-env) $ cuda-install-samples-10.1.sh .
Copying samples to ./NVIDIA_CUDA-10.1_Samples now...
Finished copying samples.
(my-pai-env) $ cd NVIDIA_CUDA-10.1_Samples
(my-pai-env) $ export CUDA_PATH=$CONDA_PREFIX
(my-pai-env) $ echo $GXX
/opt/anaconda/envs/my-pai-env/bin/powerpc64le-conda_cos7-linux-gnu-g++
(my-pai-env) $ export HOST_COMPILER=$GXX
(my-pai-env) $ export OMPI_CXX=$GXX
(my-pai-env) $ make
WARNING - GCC variable has been deprecated
WARNING - please use HOST_COMPILER=/opt/anaconda/envs/my-pai-env/bin/powerpc64le-conda_cos7-linux-gnu-gcc instead
make[1]: Entering directory `/home/builder/NVIDIA_CUDA-10.1_Samples/0_Simple/simpleCallback'
...
cp conjugateGradient ../../bin/ppc64le/linux/release
make[1]: Leaving directory `/home/builder/NVIDIA_CUDA-10.1_Samples/7_CUDALibraries/conjugateGradient'
Finished building CUDA samples
(my-pai-env) $ bin/ppc64le/linux/release/deviceQuery
bin/ppc64le/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 4 CUDA Capable device(s)
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 4
Result = PASS
Build example: Python application with native code: Horovod
Python applications (possibly including native code) can be built similarly and again may require setting a build variable to provide the CUDA path.
For a full description of how to build Horovod with PowerAI DDL, see Distributed deep learning with Horovod and PowerAI DDL.
Note the steps include installing the Anaconda compiler and cudatoolkit-dev
, and
then setting HOROVOD_CUDA_HOME
to indicate the CUDA path:
(my-pai-env) $ conda install gxx_linux-ppc64le cffi cudatoolkit-dev
(my-pai-env) $ HOROVOD_CUDA_HOME=$CONDA_PREFIX HOROVOD_GPU_ALLREDUCE=DDL pip install horovod --no-cache-dir
C++ compiler compatibility
There is an incompatibility between GCC versions 4 and 5, involving name mangling for functions
using some C++11 features, such as the newer version of std::string
.
Newer compilers will (by default) tag such functions with an ABI indicator including the string "cxx11". Older compilers lack support for the tag, so will not generate or expect it.
Using the example of a Protobuf function, a newer compiler would generate a reference to:
_ZNK6google8protobuf7Message11GetTypeNameB5cxx11Ev
While an older compiler would instead expect the same function to be named:
_ZNK6google8protobuf7Message11GetTypeNameEv
Objects and libraries built with an older GCC toolchain may be incompatible with those built with
a newer GCC toolchain. The typical symptom would be a link or load failure with a complaint of
unresolved or undefined symbols, even though the library that should contain the symbols is present
and visible. Inspection (for example, with objdump -T ...
or similar) shows a
near-match of symbols, but with the cxx11
ABI tagging difference as previously
shown.
More information can be found at https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html
WML CE is built with a recent Anaconda toolchain (version 7.x), with C++11 support, and generally with the compiler defaults (ABI tagging enabled).
You must use a recent g++ compiler when building any C++ code to interface to WML CE libraries. The default distribution compiler on RHEL 7.6 (g++ 4.8.5) will be incompatible.
We recommend the Anaconda v7 toolchain (conda install gxx_linux-ppc64le=7
or
conda install gxx_linux-64=7
) since that is what is used to build WML CE. However, Red Hat Developer Toolset or the Ubuntu
18.04 default toolchain may also work.