Offloading computations to the NVIDIA GPUs

The combination of the IBM® POWER® processors and the NVIDIA GPUs provides a platform for heterogeneous high-performance computing that can run several technical computing workloads efficiently. The computational capability is built on top of massively parallel and multithreaded cores within the NVIDIA GPUs and the IBM POWER processors. You can offload parallel operations within applications, such as data analysis or high-performance computing workloads, to GPUs.

System prerequisites

To compile and link programs that contain code to be offloaded to the NVIDIA GPUs with IBM XL Fortran for Linux, you must ensure the following operating system, hardware, and software requirements are met.
  • Use any IBM Power Systems™ server that has one or more NVIDIA GPUs installed and is supported by your Linux operating system distribution and the NVIDIA CUDA Toolkit.
  • Use the supported little endian operating system.
  • Install NVIDIA CUDA Toolkit 8.0.
For more information, see System prerequisites to offload computations to the NVIDIA GPUs in the XL Fortran Installation Guide.

Programming with supported OpenMP 4.5 device constructs

IBM XL Fortran for Linux, V15.1.5 partially supports the OpenMP Application Program Interface Version 4.5 specification. You can offload compute-intensive parts of an application and associated data to the NVIDIA GPUs by using the following supported device constructs.
  • TARGET DATA
  • TARGET ENTER DATA
  • TARGET EXIT DATA
  • TARGET
  • TARGET UPDATE
  • DECLARE TARGET
  • TEAMS
  • DISTRIBUTE
  • DISTRIBUTE PARALLEL DO

For example, you can use the TARGET directive to define a target region, which is a block of computation that operates within a distinct data environment and is intended to be offloaded onto a parallel computation device during execution. For more information about the OpenMP directives, see Parallelization directives.

You can also use other OpenMP constructs with these OpenMP device constructs to exert finer control on parallelization, such as the combined constructs that are listed in OpenMP combined constructs.

You must specify the -qoffload option to enable the support for offloading OpenMP target regions to NVIDIA GPUs. For -qoffload to take effect, you must also specify the -qsmp option to enable support for OpenMP target regions. For more information, see -qoffload in the XL Fortran Compiler Reference.

You can also use the XLSMPOPTS=target={mandatory | optional | disable} environment variable to control which device to execute target regions on. For more information, see XLSMPOPTS.

You can also use the omp_get_default_device, omp_get_initial_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num, and omp_is_initial_device runtime routines to query the target environment. For more information about OpenMP runtime functions, see Routines for OpenMP.

Programming with supported CUDA Fortran features

IBM XL Fortran for Linux supports the CUDA Fortran programming model to exploit the NVIDIA GPUs. You can use the commonly used subset of CUDA Fortran that is provided by IBM XL Fortran for Linux to offload computations to the NVIDIA GPUs.

You must specify the -qcuda option to enable the compiler support for CUDA Fortran.

For more information about CUDA Fortran programming using IBM XL Fortran for Linux, including useful compiler options and a list of supported CUDA Fortran features, see the Getting Started with CUDA Fortran programming using XL Fortran.



Voice your opinion on getting help information Ask IBM compiler experts a technical question in the IBM XL compilers forum Reach out to us