LSB_GPU_REQ

Specify GPU requirements together in one statement.

Syntax

LSB_GPU_REQ = "[num=num_gpus[/task | host]] [:mode=shared | exclusive_process] [:mps=yes[,shared][,nocvd] | no | per_socket[,shared][,nocvd] | per_gpu[,shared][,nocvd]] [:j_exclusive=yes | no] [:aff=yes | no] [:block=yes | no] [:gpack=yes | no]"

Description

The LSB_GPU_REQ parameter takes the following arguments:
num=num_gpus[/task | host]
The number of physical GPUs required by the job. By default, the number is per host. You can also specify that the number is per task by specifying /task after the number.

If you specified that the number is per task, the configuration of the ngpus_physical resource in the lsb.resources file is set to PER_TASK, or the RESOURCE_RESERVE_PER_TASK=Y parameter is set in the lsb.params file, this number is the requested count per task.

mode=shared | exclusive_process
The GPU mode when the job is running, either shared or exclusive_process. The default mode is shared.

The shared mode corresponds to the Nvidia or AMD DEFAULT compute mode. The exclusive_process mode corresponds to the Nvidia EXCLUSIVE_PROCESS compute mode.

Note: Do not specify exclusive_process when you are using AMD GPUs (that is, when gvendor=amd is specified).
mps=yes[,nocvd][,shared] | per_socket[,shared][,nocvd] | per_gpu[,shared][,nocvd] | no
Enables or disables the Nvidia Multi-Process Service (MPS) for the GPUs that are allocated to the job. Using MPS effectively causes the EXCLUSIVE_PROCESS mode to behave like the DEFAULT mode for all MPS clients. MPS always allows multiple clients to use the GPU through the MPS server.
Note: To avoid inconsistent behavior, do not enable mps when you are using AMD GPUs (that is, when gvendor=amd is specified). If the result of merging the GPU requirements at the cluster, queue, application, and job levels is gvendor=amd and mps is enabled (for example, if gvendor=amd is specified at the job level without specifying mps=no, but mps=yes is specified at the application, queue, or cluster level), LSF ignores the mps requirement.

MPS is useful for both shared and exclusive process GPUs, and allows more efficient sharing of GPU resources and better GPU utilization. See the Nvidia documentation for more information and limitations.

When using MPS, use the EXCLUSIVE_PROCESS mode to ensure that only a single MPS server is using the GPU, which provides additional insurance that the MPS server is the single point of arbitration between all CUDA process for that GPU.

You can also enable MPS daemon sharing by adding the share keyword with a comma and no space (for example, mps=yes,shared enables MPS daemon sharing on the host). If sharing is enabled, all jobs that are submitted by the same user with the same resource requirements share the same MPS daemon on the host, socket, or GPU.

LSF starts MPS daemons on a per-host, per-socket, or per-GPU basis depending on value of the mps keyword:

  • If mps=yes is set, LSF starts one MPS daemon per host per job.

    When share is enabled (that is, if mps=yes,shared is set), LSF starts one MPS daemon per host for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon on the host.

    When the CUDA_VISIBLE_DEVICES environment variable is disabled (that is, if mps=yes,nocvd is set), LSF does not set the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, so LSF MPI does not set CUDA_VISIBLE_DEVICES for the tasks. LSF just sets the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, not CUDA_VISIBLE_DEVICES. LSF MPI converts the CUDA_VISIBLE_DEVICES<number> environment variables into CUDA_VISIBLE_DEVICES and sets that for the tasks.

  • If mps=per_socket is set, LSF starts one MPS daemon per socket per job. When enabled with share (that is, if mps=per_socket,shared is set), LSF starts one MPS daemon per socket for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the socket.
  • If mps=per_gpu is set, LSF starts one MPS daemon per GPU per job. When enabled with share (that is, if mps=per_gpu,shared is set), LSF starts one MPS daemon per GPU for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the GPU.
Important: Using EXCLUSIVE_THREAD mode with MPS is not supported and might cause unexpected behavior.
j_exclusive=yes | no
Specifies whether the allocated GPUs can be used by other jobs. When the mode is set to exclusive_process, the j_exclusive=yes option is set automatically.
aff=yes | no
Specifies whether to enforce strict GPU-CPU affinity binding. If set to no, LSF relaxes GPU affinity while maintaining CPU affinity. By default, aff=yes is set to maintain strict GPU-CPU affinity binding.
Note: The aff=yes setting conflicts with block=yes (distribute allocated GPUs as blocks when the number of tasks is greater than the requested number of GPUs). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If aff=yes and block=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).
block=yes | no
Specifies whether to enable block distribution, that is, to distribute the allocated GPUs of a job as blocks when the number of tasks is greater than the requested number of GPUs. If set to yes, LSF distributes all the allocated GPUs of a job as blocks when the number of tasks is bigger than the requested number of GPUs. By default, block=no is set so that allocated GPUs are not distributed as blocks.

For example, if a GPU job requests to run on a host with 4 GPUs and 40 tasks, block distribution assigns GPU0 for ranks 0-9, GPU1 for ranks 10-19, GPU2 for tanks 20-29, and GPU3 for ranks 30-39.

Note: The block=yes setting conflicts with aff=yes (strict CPU-GPU affinity binding). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If block=yes and aff=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).
gpack=yes | no
For shared mode jobs only. Specifies whether to enable pack scheduling. If set to yes, LSF packs multiple shared mode GPU jobs to allocated GPUs. LSF schedules shared mode GPUs as follows:
  1. LSF sorts the candidate hosts (from largest to smallest) based on the number of shared GPUs that already have running jobs, then by the number of GPUs that are not exclusive.

    If the order[] keyword is defined in the resource requirements string, after sorting order[], LSF re-sorts the candidate hosts by the gpack policy (by shared GPUs that already have running jobs first, then by the number of GPUs that are not exclusive). The gpack policy sort priority is higher than the order[] sort.

  2. LSF sorts the candidate GPUs on each host (from largest to smallest) based on the number of running jobs.

After scheduling, the shared mode GPU job packs to the allocated shared GPU that is sorted first, not to a new shared GPU.

If Docker attribute affinity is enabled, the order of candidate hosts are sorted by Docker attribute affinity before sorting by GPUs.

By default, gpack=no is set so that pack scheduling is disabled.

If the GPU_REQ_MERGE parameter is defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), each option of the GPU requirement is merged separately. Job level overrides application level, which overrides queue level, which overrides the default cluster GPU requirement. For example, if the mode option of the GPU requirement is defined on the -gpu option, and the mps option is defined in the queue, the mode of job level and the mps value of queue is used.

If the GPU_REQ_MERGE parameter is not defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), the entire GPU requirement string is replaced. The entire job level GPU requirement string overrides application level, which overrides queue level, which overrides the default GPU requirement.

The esub parameter LSB_SUB4_GPU_REQ modifies the value of the -gpu option.

LSF selects the GPU that meets the topology requirement first. If the GPU mode of the selected GPU is not the requested mode, LSF changes the GPU to the requested mode. For example, if LSF allocates an exclusive_process GPU to a job that needs a shared GPU, LSF changes the GPU mode to shared before the job starts and then changes the mode back to exclusive_process when the job finishes.

The GPU requirements are converted to rusage resource requirements for the job. For example, num=2 is converted to rusage[ngpus_physical=2]. Use the bjobs, bhist, and bacct commands to see the merged resource requirement.

There might be complex GPU requirements that the bsub -gpu option and GPU_REQ parameter syntax cannot cover, including compound GPU requirements (for different GPU requirements for jobs on different hosts, or for different parts of a parallel job) and alternate GPU requirements (if more than one set of GPU requirements might be acceptable for a job to run). For complex GPU requirements, use the bsub -R command option, or the RES_REQ parameter in the lsb.applications or lsb.queues file to define the resource requirement string.

Important: You can define the mode, j_exclusive, and mps options only with the -gpu option, the LSB_GPU_REQ parameter in the lsf.conf file, or the GPU_REQ parameter in the lsb.queues or lsb.applications files. You cannot use these options with the rusage resource requirement string in the bsub -R command option or the RES_REQ parameter in the lsb.queues or lsb.applications files.

Default

Not defined

See also

  • GPU_REQ
  • bsub -gpu