Reference and limitations for CUDA Fortran support

IBM® XL Fortran for Linux, V15.1.5 supports a commonly used subset of CUDA Fortran. For more information about the language extensions introduced by CUDA Fortran, see the CUDA Fortran Programming Guide and Reference manual downloadable from http://www.pgroup.com/doc/pgicudaforug.pdf.

The following CUDA Fortran features are not supported in IBM XL Fortran for Linux, V15.1.5:

Calling reduction intrinsic functions, such as sum, maxval, and minval, on the host with device actual arguments
Conditional sentinels for CUDA Fortran (!@CUF)
CUF kernel directives
Data transfer using the following CUDA Runtime APIs:
- cudaMemcpyFromSymbol
- cudaMemcpyFromSymbolAsync
- cudaMemcpyToSymbol
- cudaMemcpyToSymbolAsync
- cudaMemset
Note: You can use assignment, cudaMemcpy, or cudaMemcpyAsync instead. XL Fortran allows device global and constant module variables to appear as arguments to cudaMemcpy and cudaMemcpyAsync.
Data transfer using the following CUDA Runtime APIs:
- cudaMalloc3D when the first argument is a rank 3 allocatable array
- cudaMemcpyPeer
- cudaMemcpyPeerAsync
- cudaMemcpy2D
- cudaMemcpy2DAsync
- cudaMemcpy3DAsync
- cudaMemset2D
Debug support
Note: You can get basic line level debugging by compiling with -g -qfullpath.
Dynamic parallelism
PRINT and WRITE statements in device code
Procedure definitions or interfaces that have the attributes(host, device) prefix
Note: To work around this, make a copy of the procedure, and give one copy the attributes(host) prefix and the other copy the attributes(device) prefix. You must not defined the two procedures in the same compilation unit.
Pointers with the texture attribute
Note: The compiler automatically utilizes the texture cache for passing dummy arguments when appropriate.
Shuffle intrinsics
The curand module

The following limitations apply to IBM XL Fortran for Linux, V15.1.5:

You can use CUDA Fortran with IBM XL Fortran for Linux, V15.1.5 only if the CUDA Toolkit 8.0 is installed and the compiler is configured with the location of the toolkit.
- If you install the compiler after you install the toolkit, the compiler detects the location of the toolkit and no action is required.
- If you install the toolkit after you install the compiler, reconfigure the compiler as described in Configuring IBM XL Fortran for Linux.
Note: To install the CUDA Toolkit, use the Package Manager installation. The Runfile installation is currently not supported on Power® processors. For instructions about Package Manager installation, see the NVIDIA CUDA Installation Guide for Linux.
IBM XL Fortran for Linux, V15.1.5 automatically detects the GPU architecture at compiler configuration time. The GPU architecture is encoded into the compiler configuration file. No compiler options are provided to target other GPU architectures.
Programs that use dynamic shared memory might fail due to an issue in the CUDA Toolkit 8.0. The compiler issues the following message:
```
Bitcasts between pointers of different address spaces is not legal. 
Use AddrSpaceCast instead.
```
To work around this issue, compile the affected file with the -Xllvm2ptx -nvvm-compile-options=-opt=0 option.

Voice your opinion on getting help information

Ask IBM compiler experts a technical question in the IBM XL compilers forum