-gpu

Specifies properties of GPU resources required by the job.

Synopsis

Description

-

Specifies that the job does not set job-level GPU requirements. Use the hyphen with no letter to set the effective GPU requirements, which are defined at the cluster, queue, or application profile level.

If a GPU requirement is specified at the cluster, queue, and application profile level, each option (num, mode, mps, j_exclusive, gmodel, gtile, gmem, and nvlink) of the GPU requirement is merged separately. Application profile level overrides queue level, which overrides the cluster level default GPU requirement.

If there are no GPU requirements defined in the cluster, queue, or application level, the default value is "num=1:mode=shared:mps=no:j_exclusive=no".

num=num_gpus[/task | host]

The number of physical GPUs required by the job. By default, the number is per host. You can also specify that the number is per task by specifying /task after the number.

If you specified that the number is per task, the configuration of the ngpus_physical resource in the lsb.resources file is set to PER_TASK, or the RESOURCE_RESERVE_PER_TASK=Y parameter is set in the lsb.params file, this number is the requested count per task.

mode=shared | exclusive_process

The GPU mode when the job is running, either shared or exclusive_process. The default mode is shared.

The shared mode corresponds to the NVIDIA or AMD DEFAULT compute mode. The exclusive_process mode corresponds to the NVIDIA EXCLUSIVE_PROCESS compute mode.

Note:

LSF does not support setting exclusive_process on NVIDIA GPU instances or compute instances. Jobs can be dispatched and run as usual, but the driver mode will not be set.
Do not specify exclusive_process when you are using AMD GPUs (that is, when gvendor=amd is specified).

mps=yes[,nocvd][,shared] | per_socket[,shared][,nocvd] | per_gpu[,shared][,nocvd] | no

Enables or disables the NVIDIA Multi-Process Service (MPS) for the GPUs that are allocated to the job. Using MPS effectively causes the EXCLUSIVE_PROCESS mode to behave like the DEFAULT mode for all MPS clients. MPS always allows multiple clients to use the GPU through the MPS server.

Note: To avoid inconsistent behavior, do not enable mps when you are using AMD GPUs (that is, when gvendor=amd is specified). If the result of merging the GPU requirements at the cluster, queue, application, and job levels is gvendor=amd and mps is enabled (for example, if gvendor=amd is specified at the job level without specifying mps=no, but mps=yes is specified at the application, queue, or cluster level), LSF ignores the mps requirement.

MPS is useful for both shared and exclusive process GPUs, and allows more efficient sharing of GPU resources and better GPU utilization. See the NVIDIA documentation for more information and limitations.

When using MPS, use the EXCLUSIVE_PROCESS mode to ensure that only a single MPS server is using the GPU, which provides additional insurance that the MPS server is the single point of arbitration between all CUDA process for that GPU.

You can also enable MPS daemon sharing by adding the share keyword with a comma and no space (for example, mps=yes,shared enables MPS daemon sharing on the host). If sharing is enabled, all jobs that are submitted by the same user with the same resource requirements share the same MPS daemon on the host, socket, or GPU.

LSF starts MPS daemons on a per-host, per-socket, or per-GPU basis depending on value of the mps keyword:

If mps=yes is set, LSF starts one MPS daemon per host per job.
When share is enabled (that is, if mps=yes,shared is set), LSF starts one MPS daemon per host for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon on the host.

When the CUDA_VISIBLE_DEVICES environment variable is disabled (that is, if mps=yes,nocvd is set), LSF does not set the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, so LSF MPI does not set CUDA_VISIBLE_DEVICES for the tasks. LSF just sets the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, not CUDA_VISIBLE_DEVICES. LSF MPI converts the CUDA_VISIBLE_DEVICES<number> environment variables into CUDA_VISIBLE_DEVICES and sets that for the tasks.
If mps=per_socket is set, LSF starts one MPS daemon per socket per job. When enabled with share (that is, if mps=per_socket,shared is set), LSF starts one MPS daemon per socket for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the socket.
If mps=per_gpu is set, LSF starts one MPS daemon per GPU per job. When enabled with share (that is, if mps=per_gpu,shared is set), LSF starts one MPS daemon per GPU for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the GPU.

Important: Using EXCLUSIVE_THREAD mode with MPS is not supported and might cause unexpected behavior.

j_exclusive=yes | no

Specifies whether the allocated GPUs can be used by other jobs. When the mode is set to exclusive_process, the j_exclusive=yes option is set automatically.

aff=yes | no

Specifies whether to enforce strict GPU-CPU affinity binding. If set to no, LSF relaxes GPU affinity while maintaining CPU affinity. By default, aff=yes is set to maintain strict GPU-CPU affinity binding.

Note: The aff=yes setting conflicts with block=yes (distribute allocated GPUs as blocks when the number of tasks is greater than the requested number of GPUs). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If aff=yes and block=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).

block=yes | no

Specifies whether to enable block distribution, that is, to distribute the allocated GPUs of a job as blocks when the number of tasks is greater than the requested number of GPUs. If set to yes, LSF distributes all the allocated GPUs of a job as blocks when the number of tasks is bigger than the requested number of GPUs. By default, block=no is set so that allocated GPUs are not distributed as blocks.

For example, if a GPU job requests to run on a host with 4 GPUs and 40 tasks, block distribution assigns GPU0 for ranks 0-9, GPU1 for ranks 10-19, GPU2 for tanks 20-29, and GPU3 for ranks 30-39.

Note: The block=yes setting conflicts with aff=yes (strict CPU-GPU affinity binding). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If block=yes and aff=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).

gpack=yes | no

For shared mode jobs only. Specifies whether to enable pack scheduling. If set to yes, LSF packs multiple shared mode GPU jobs to allocated GPUs. LSF schedules shared mode GPUs as follows:

LSF sorts the candidate hosts (from largest to smallest) based on the number of shared GPUs that already have running jobs, then by the number of GPUs that are not exclusive.
If the order[] keyword is defined in the resource requirements string, after sorting order[], LSF re-sorts the candidate hosts by the gpack policy (by shared GPUs that already have running jobs first, then by the number of GPUs that are not exclusive). The gpack policy sort priority is higher than the order[] sort.
LSF sorts the candidate GPUs on each host (from largest to smallest) based on the number of running jobs.

After scheduling, the shared mode GPU job packs to the allocated shared GPU that is sorted first, not to a new shared GPU.

If Docker attribute affinity is enabled, the order of candidate hosts are sorted by Docker attribute affinity before sorting by GPUs.

By default, gpack=no is set so that pack scheduling is disabled.

migpack=yes | no

Available as of Fix Pack 16 and requires NVIDIA MIG scheduling to be enabled (that is, set as LSF_MANAGE_MIG=Y in the lsf.conf file). Specifies whether LSF should schedule NVIDIA MIG workload in packed mode. If set to yes, LSF prioritizes workload on the busiest GPU with enough empty MIG slots to fit the job's resource requirements, rather than prioritize the least busy GPU. LSF sorts the candidate GPUs on each host, and there is no cross host GPU sorting. This option maximizes GPU utilization as it packs in multiple smaller MIG instances to complete the job.

The default setting is no. If set at the job, application, and queue levels, the job level takes precedence, then the application, then the queue.

gvendor=amd | nvidia

Specifies the GPU vendor type. LSF allocates GPUs with the specified vendor type.

Specify amd to request AMD GPUs, or specify nvidia to request NVIDIA GPUs.

By default, LSF requests NVIDIA GPUs.

gmodel=model_name[-mem_size]

Specifies GPUs with the specific model name and, optionally, its total GPU memory. By default, LSF allocates the GPUs with the same model, if available.

The gmodel keyword supports the following formats:

gmodel=model_name: Requests GPUs with the specified brand and model name (for example, TeslaK80).
gmodel=short_model_name: Requests GPUs with a specific brand name (for example, Tesla, Quadro, NVS, ) or model type name (for example, K80, P100).
gmodel=model_name-mem_size: Requests GPUs with the specified brand name and total GPU memory size. The GPU memory size consists of the number and its unit, which includes M, G, T, MB, GB, and TB (for example, 12G).

To find the available GPU model names on each host, run the lsload –gpuload, lshosts –gpu, or bhosts -gpu commands. The model name string does not contain space characters. In addition, the slash (/) and hyphen (-) characters are replaced with the underscore character (_). For example, the GPU model name “Tesla C2050 / C2070” is converted to “TeslaC2050_C2070” in LSF.

gmem=mem_value

Specify the GPU memory on each GPU required by the job. The format of mem_value is the same to other resource value (for example, mem or swap) in the rusage section of the job resource requirements (-R).

gtile=! | tile_num

Specifies the number of GPUs per socket. Specify an number to explicitly define the number of GPUs per socket on the host, or specify an exclamation mark (!) to enable LSF to automatically calculate the number, which evenly divides the GPUs along all sockets on the host. LSF guarantees the gtile requirements even for affinity jobs. This means that LSF might not allocate the GPU's affinity to the allocated CPUs when the gtile requirements cannot be satisfied.

If the gtile keyword is not specified for an affinity job, LSF attempts to allocate enough GPUs on the sockets that allocated GPUs. If there are not enough GPUs on the optimal sockets, jobs cannot go to this host.

If the gtile keyword is not specified for a non-affinity job, LSF attempts to allocate enough GPUs on the same socket. If this is not available, LSF might allocate GPUs on separate GPUs.

nvlink=yes

Obsolete in LSF, Version 10.1 Fix Pack 11. Use the glink keyword instead. Enables the job enforcement for NVLink connections among GPUs. LSF allocates GPUs with NVLink connections in force.

glink=yes

Enables job enforcement for special connections among GPUs. LSF must allocate GPUs with the special connections that are specific to the GPU vendor.

If the job requests AMD GPUs, LSF must allocate GPUs with the xGMI connection. If the job requests NVIDIA GPUs, LSF must allocate GPUs with the NVLink connection.

Do not use glink together with the obsolete nvlink keyword.

By default, LSF can allocate GPUs without special connections when there are not enough GPUs with these connections.

The syntax of the GPU requirement in the -gpu option is the same as the syntax in the LSB_GPU_REQ parameter in the lsf.conf file and the GPU_REQ parameter in the lsb.queues and lsb.applications files.

Note: The bjobs output does not show aff=yes even if you specify aff=yes in the bsub -gpu option.

If the GPU_REQ_MERGE parameter is defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), each option of the GPU requirement is merged separately. Job level overrides application level, which overrides queue level, which overrides the default cluster GPU requirement. For example, if the mode option of the GPU requirement is defined on the -gpu option, and the mps option is defined in the queue, the mode of job level and the mps value of queue is used.

If the GPU_REQ_MERGE parameter is not defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), the entire GPU requirement string is replaced. The entire job level GPU requirement string overrides application level, which overrides queue level, which overrides the default GPU requirement.

The esub parameter LSB_SUB4_GPU_REQ modifies the value of the -gpu option.

LSF selects the GPU that meets the topology requirement first. If the GPU mode of the selected GPU is not the requested mode, LSF changes the GPU to the requested mode. For example, if LSF allocates an exclusive_process GPU to a job that needs a shared GPU, LSF changes the GPU mode to shared before the job starts and then changes the mode back to exclusive_process when the job finishes.

The GPU requirements are converted to rusage resource requirements for the job. For example, num=2 is converted to rusage[ngpus_physical=2]. Use the bjobs, bhist, and bacct commands to see the merged resource requirement.

There might be complex GPU requirements that the bsub -gpu option and GPU_REQ parameter syntax cannot cover, including compound GPU requirements (for different GPU requirements for jobs on different hosts, or for different parts of a parallel job) and alternate GPU requirements (if more than one set of GPU requirements might be acceptable for a job to run). For complex GPU requirements, use the bsub -R command option, or the RES_REQ parameter in the lsb.applications or lsb.queues file to define the resource requirement string.

Important: You can define the mode, j_exclusive, and mps options only with the -gpu option, the LSB_GPU_REQ parameter in the lsf.conf file, or the GPU_REQ parameter in the lsb.queues or lsb.applications files. You cannot use these options with the rusage resource requirement string in the bsub -R command option or the RES_REQ parameter in the lsb.queues or lsb.applications files.

Examples

The following job request does not override the effective GPU requirement with job-level GPU requirements. The effective GPU requirement is merged together from GPU requirements that are specified at the cluster, queue, and application profile level. Application profile level overrides queue level, which overrides the cluster level default GPU requirement.

bsub -gpu - ./app

The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts MPS before running the job.

bsub -gpu "num=2:mode=exclusive_process:mps=yes" ./app

The following job requires 2 DEFAULT mode GPUs and uses them exclusively. The two GPUs cannot be used by other jobs even though the mode is shared.

bsub -gpu "num=2:mode=shared:j_exclusive=yes" ./app

The following job uses 3 DEFAULT mode GPUs and shares them with other jobs.

bsub -gpu "num=3:mode=shared:j_exclusive=no" ./app

The following job uses 4 EXCLUSIVE_PROCESS GPUs that cannot be used by other jobs. The j_exclusive option defaults to yes for this job.

bsub -gpu "num=4:mode=exclusive_process" ./app

The following job requires two tasks. Each task requires 2 EXCLUSIVE_PROCESS GPUs on two hosts. The GPUs are allocated in the same NUMA as the allocated CPU.

bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]" ./app

The following job uses two NVIDIA MIG GPUs with 3 GPU instances and 2 compute instances.

bsub -gpu "num=2:mig=3/2" ./app

-gpu

Categories

Synopsis

Description

Examples

See also