-qtgtarch

Pragma equivalent

None.

Purpose

Specifies real or virtual GPU architectures where the code may run. This allows the compiler to take maximum advantage of the capabilities and machine instructions that are specific to a GPU architecture, or common to a virtual architecture.

The compiler automatically detects the GPU architecture at compiler configuration time. The GPU architecture is encoded into the compiler configuration file. You can override the default by using the -qtgtarch option.

Syntax

Default

-qtgtarch=default

Parameters

auto

The architecture of device 0 of the system on which the compiler is being executed.

default

The default architecture, which is determined as follows:

The architecture specified by the cuda_cc_major and cuda_cc_minor properties which are set in the configuration file;
If not specified, the architecture of device 0 of the system on which the compiler is being executed;
If there is no device 0, sm_35.

real_GPU_architecture

A real GPU architecture, such as sm_35, sm_60, or sm_70, as defined by the CUDA Toolkit.

virtual_GPU_architecture

A virtual GPU architecture, such as compute_35, compute_60, or compute_70, as defined by the CUDA Toolkit. Virtual GPU architectures specify the features which are supported in the high level PTX code.

Rules

The PTX intermediate code is generated based on the specified virtual GPU architectures and then embedded in the resulting object file or executable. To generate and embed the compiled code images, specify real GPU architectures. The compiled code images for the real GPU architectures are generated from the PTX code.

Each -qtgtarch option is used to generate PTX code for exactly one virtual GPU architecture and optionally compiled code images for one or more compatible real GPU architectures. If you need to generate PTX code for multiple virtual GPU architectures, specify the -qtgtarch option multiple times, once for each virtual GPU architecture.

The compiler converts between virtual and real GPU architectures when needed, for example, when no virtual architecture is specified, or when multiple virtual GPU architectures are specified.

You can specify the -qtgtarch option multiple times, even for the same virtual GPU architecture. The resulting effect is cumulative.

Detailed rules for specifying the -qtgtarch option are listed as follows:

Table 1. Detailed rules for specifying one -qtgtarch option
Number of virtual GPU architectures specified	Number of real GPU architectures specified	The virtual GPU architectures for which the PTX code is generated	The real GPU architectures for which the compiled code images are generated
0	At least one	The virtual GPU architecture corresponding to the lowest level real GPU architecture specified	The real GPU architectures specified
1	0	The virtual GPU architecture specified	N/A Note: When no compiled code image is embedded in the resulting object file or executable, a compiled code image will be generated from the PTX code using just-in-time compilation at link or execution time, if needed.
1	At least one	The virtual GPU architecture specified	The real GPU architectures specified
More than one	0	The lowest level virtual GPU architecture specified	The real GPU architectures corresponding to all but the lowest virtual GPU architecture specified
More than one	At least one	The lowest level virtual GPU architecture specified	The real GPU architectures specified and the real GPU architectures corresponding to all but the lowest virtual GPU architecture specified

Predefined macros