Offloading computations to the NVIDIA GPUs

The combination of the IBM® POWER® processors and the NVIDIA GPUs provides a platform for heterogeneous high-performance computing that can run several technical computing workloads efficiently. The computational capability is built on top of massively parallel and multithreaded cores within the NVIDIA GPUs and the IBM POWER processors. You can offload parallel operations within applications, such as data analysis or high-performance computing workloads, to GPUs.

System prerequisites

To compile and link programs that contain code to be offloaded to the NVIDIA GPUs with IBM XL C/C++ for Linux, you must ensure the following operating system, hardware, and software requirements are met.

Use any IBM Power Systems™ server that has one or more NVIDIA GPUs installed and is supported by your Linux operating system distribution and the NVIDIA CUDA Toolkit.
Use the supported little endian operating system.
Install NVIDIA CUDA Toolkit 8.0.

For more information, see System prerequisites to offload computations to the NVIDIA GPUs in the XL C/C++ Installation Guide.

Programming with supported OpenMP 4.5 device constructs

IBM XL C/C++ for Linux, V13.1.5 partially supports the OpenMP Application Program Interface Version 4.5 specification. You can offload compute-intensive parts of an application and associated data to the NVIDIA GPUs by using the following supported device constructs.

omp target data
omp target enter data
omp target exit data
omp target
omp target update
omp declare target
omp teams
omp distribute
omp distribute parallel for

For example, you can use the omp target directive to define a target region, which is a block of computation that operates within a distinct data environment and is intended to be offloaded onto a parallel computation device during execution. For more information about the OpenMP directives, see Pragma directives for parallel processing in the XL C/C++ Compiler Reference.

You can also use other OpenMP constructs with these OpenMP device constructs to exert finer control on parallelization, such as the combined constructs that are listed in Combined constructs in the XL C/C++ Compiler Reference.

You must specify the -qoffload option to enable the support for offloading OpenMP target regions to NVIDIA GPUs. For -qoffload to take effect, you must also specify the -qsmp option to enable support for OpenMP target regions. For more information, see -qoffload in the XL C/C++ Compiler Reference.

You can also use the XLSMPOPTS=target={mandatory | optional | disable} environment variable to control which device to execute target regions on. For more information, see XLSMPOPTS in the XL C/C++ Compiler Reference.

You can also use the supported runtime functions, for example, to query the target environment or to manage device memory.

Table 1. Some useful OpenMP runtime functions for offloading computations to the NVIDIA GPUs
To query the target environment	To manage device memory
omp_get_default_device omp_get_initial_device omp_get_num_devices omp_get_num_teams omp_get_team_num omp_is_initial_device	omp_target_alloc omp_target_associate_ptr omp_target_disassociate_ptr omp_target_free omp_target_is_present omp_target_memcpy

For more information about OpenMP runtime functions, see OpenMP runtime functions for parallel processing in the XL C/C++ Compiler Reference.

Using IBM XL C/C++ for Linux with NVCC

The NVIDIA CUDA C++ compiler (NVCC) from the NVIDIA CUDA Toolkit partitions C/C++ source code into host and device portions. You can use IBM XL C/C++ for Linux as the host compiler for the POWER processor with NVCC 7.5 or 8.0. For more information, see the NVIDIA CUDA on IBM POWER8®: Technical overview, software installation, and application development downloadable from http://www.redbooks.ibm.com/redpapers/pdfs/redp5169.pdf.

Voice your opinion on getting help information

Ask IBM compiler experts a technical question in the IBM XL compilers forum