IBM Support

An introduction to NVIDIA CUDA on IBM POWER8

Technical Blog Post


Abstract

An introduction to NVIDIA CUDA on IBM POWER8

Body

A wide range of applications applied to technical areas such as computational vision, chemistry, bioinformatics, molecular biology, engineering and financial analysis are using heterogeneous computing systems with general purpose GPU (Graphics Processing Unit) hardware as their high performance platform of choice.

Recently launched, the IBM Power System S824L comes into play to explore use of the NVIDIA Tesla K40 GPU combined with the latest IBM POWER8 CPU, providing a unique platform for heterogeneous high performance computing.

The Power S824L system comes with up to 2 Tesla K40 GPU cards (based on Kepler(TM) Architecture), each of them is able to delivery 1.43 and 4.29 Tflops of peak performance on, respectively, single and double-precision float point operations. The Tesla K40 GPU features:

  • 15 SMX (Streaming Multiprocessor)
    • Simultaneously execute 4 warps (group of 32 parallel threads)
    • ALU fully compliant with IEEE 754-2008 standard
  • 64 KB configurable shared memory and L1 cache per multiprocessor.
  • 48 KB read-only data cache per multiprocessor
  • 1536 KB L2 cache
  • 12 GB DRAM (GDDR5)
  • GPU Boost Clock
  • Simultaneously execute 2880 CUDA cores (192 per multiprocessor)
  • Supports CUDA compute capability 3.5
  • Dynamic parallelism (ability to launch nested CUDA kernels)
  • Hyper-Q (allows several CPU threads/processes to dispatch CUDA kernels concurrently)

C/C++ CUDA programming support for POWER8 was first introduced with CUDA Toolkit 5.5 for Ubuntu 14.10 ppc64le. As of this writing, version 7 is latest CUDA Toolkit  release and it supports Ubuntu 14.04 ppc64le as well. The toolkit comes with following tools and libraries that allow development of CUDA applications on Power:

  • NVCC (NVidia CUDA Compiler) - front-end compiler
  • CUDA GDB - command line GDB-based debugger
  • CUDA Memcheck - command line memory and race checker tool
  • nvprof - command line profiling tool
  • binary utilities - include cuobjdump and nvdisasm
  • Code samples
  • POWER cross-compilation support (new in CUDA Toolkit 7.0)
  • GPU-accelerated libraries - provides many libraries and APIs, as for example, cuBLAS, cuFFT, cuSPARSE, Thrust. 
  • NSight Eclipse Edition - Eclipse-based Integrated Development Environment (IDE)

Because an CUDA application have portions of code that run exclusively on host or device processors, the NVCC is a front-end compiler driver that simplifies the process of compiling C/C++ code. As back-end compilers, there can be used either distro's GCC or IBM XL C/C++ compiler 13.1.1 (or newer). They are used to generate the objects which run on host processor, while nvcc is going to compile portions of code targeting the GPU device. 

The CUDA Toolkit for Linux on POWER8 can be free downloaded from  https://developer.nvidia.com/cuda-downloads#linux-power8

Java applications can also exploit GPU-accelerated operations since  IBM Java SDK 7.1 and 8.0 versions. It makes available for applications the following packages:

  • com.ibm.gpu - provides classes with GPU-offloaded operations (e.g. arrays sorting)
  • com.ibm.cuda - enables low-level access to CUDA devices. As for example, the API allows to load/unload CUDA modules within the GPU device to execute kernel functions.

Read much more about NVIDIA CUDA on IBM POWER8 on following IBM redpapers:

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power ->PowerLinux"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm16170595