-qtgtarch
Category
Pragma equivalent
None.
Purpose
Specifies real or virtual GPU architectures where the code may run. This allows the compiler to take maximum advantage of the capabilities and machine instructions that are specific to a GPU architecture, or common to a virtual architecture.
The compiler automatically detects the GPU architecture at compiler configuration time. The GPU architecture is encoded into the compiler configuration file. You can override the default by using the -qtgtarch option.
Syntax
Default
-qtgtarch=default
Parameters
- auto
- The architecture of device 0 of the system on which the compiler is being executed.
- default
- The default architecture, which is determined as follows:
- The architecture specified by the cuda_cc_major and cuda_cc_minor properties which are set in the configuration file;
- If not specified, the architecture of device 0 of the system on which the compiler is being executed;
- If there is no device 0, sm_35.
- real_GPU_architecture
- A real GPU architecture, such as sm_35, sm_60, or sm_70, as defined by the CUDA Toolkit.
- virtual_GPU_architecture
- A virtual GPU architecture, such as compute_35, compute_60, or compute_70, as defined by the CUDA Toolkit. Virtual GPU architectures specify the features which are supported in the high level PTX code.
Rules
The PTX intermediate code is generated based on the specified virtual GPU architectures and then embedded in the resulting object file or executable. To generate and embed the compiled code images, specify real GPU architectures. The compiled code images for the real GPU architectures are generated from the PTX code.
Each -qtgtarch option is used to generate PTX code for exactly one virtual GPU architecture and optionally compiled code images for one or more compatible real GPU architectures. If you need to generate PTX code for multiple virtual GPU architectures, specify the -qtgtarch option multiple times, once for each virtual GPU architecture.
The compiler converts between virtual and real GPU architectures when needed, for example, when no virtual architecture is specified, or when multiple virtual GPU architectures are specified.
You can specify the -qtgtarch option multiple times, even for the same virtual GPU architecture. The resulting effect is cumulative.
Detailed rules for specifying the -qtgtarch option are listed as follows:
Number of virtual GPU architectures specified | Number of real GPU architectures specified | The virtual GPU architectures for which the PTX code is generated | The real GPU architectures for which the compiled code images are generated |
---|---|---|---|
0 | At least one | The virtual GPU architecture corresponding to the lowest level real GPU architecture specified | The real GPU architectures specified |
1 | 0 | The virtual GPU architecture specified | N/A Note: When no compiled code image is embedded in the resulting
object file or executable, a compiled code image will be generated
from the PTX code using just-in-time compilation at link or execution
time, if needed.
|
1 | At least one | The virtual GPU architecture specified | The real GPU architectures specified |
More than one | 0 | The lowest level virtual GPU architecture specified | The real GPU architectures corresponding to all but the lowest virtual GPU architecture specified |
More than one | At least one | The lowest level virtual GPU architecture specified | The real GPU architectures specified and the real GPU architectures corresponding to all but the lowest virtual GPU architecture specified |
Predefined macros
None.
Examples
Examples for specifying the -qtgtarch option are listed as follows:
Command examples | The virtual GPU architectures for which the PTX code is generated | The real GPU architectures for which the compiled code images are generated |
---|---|---|
|
compute_60 | sm_60 Note: The compiled code images are generated from the
PTX code.
|
Assuming the compiler is running on a machine with a GPU with
architecture sm_37:
|
compute_37 | sm_37 Note: The compiled code image is generated from the PTX
code.
|
|
compute_35 | sm_37 and sm_60 Note: The compiled code images are generated
from the PTX code.
|
|
compute_37 | sm_37, and sm_60 Note: The compiled code images are generated
from the PTX code.
|
Assuming the compiler is running on a machine with a GPU with
architecture sm_37:
|
compute_37 | sm_37, and sm_60 Note: The compiled code images are generated
from the PTX code.
|
|
compute_35 and compute_60 | sm_35 and sm_60 Note: The sm_35 and sm_60 compiled code images
are generated from the PTX code for compute_35 and compute_60 correspondingly.
|
Related information
- -qoffload
- GPU architectures in the CUDA Toolkit documentation, available at: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation