LSB_GPU_NEW_SYNTAX
Enables the bsub -gpu option to submit jobs that require GPU resources. Also enables the specification of GPU resources with the GPU_REQ parameter in the lsb.queues and lsb.applications file or the default GPU requirement configured in the LSB_GPU_REQ parameter in the lsf.conf file.
Syntax
LSB_GPU_NEW_SYNTAX = Y | y | N | n | extend
Description
When this parameter is enabled (that is, set as Y or extend), you can use the bsub -gpu command, GPU_REQ parameter (in the lsb.queues and lsb.applications file), or LSB_GPU_REQ parameter (in the lsf.conf file) to specify your GPU requirements. Use these options and parameters to specify GPU requirements instead of using the bsub -R command or RES_REQ parameter in the lsb.queues file
In addition, if LSF_GPU_RESOURCE_IGNORE is set to Y and LSB_GPU_NEW_SYNTAX is set to Y or extend while LSB_GPU_AUTOCONFIG=Y is enabled, all built-in GPU resources (gpu_<num>n) are completely removed from the management host LIM. LSF uses a different method for the management host LIM and server host LIMs to collect GPU information. This improves LSF response time because there are fewer LSF resource metrics to configure and collect. Do not set LSF_GPU_RESOURCE_IGNORE=Y if you are using LSF RTM unless you are running LSF RTM, Version 10.2 Fix Pack 11, or later.
When the LSB_GPU_NEW_SYNTAX=Y parameter is set, you can specify the j_exclusive, mode, mps, and num GPU requirements with the bsub -gpu option. You can also specify GPU resource requirements with the GPU_REQ parameter in a queue (lsb.queues file), application profile (lsb.applications file), or a default GPU requirement with the LSB_GPU_REQ parameter (lsf.conf file). The resource requirements of your job submission cannot use the legacy GPU resources (ngpus_shared, ngpus_excl_t, ngpus_excl_p) as job resource requirements. In addition, if the PREEMPTABLE_RESOURCES parameter in the lsb.params file includes the ngpus_physical resource, GPU preemption is enabled with restrictions: GPU jobs can only be preempted if they have mode=exclusive_process or j_exclusive=yes specified in the GPU resource requirements and are configured for automatic job migration and rerun (that is, the MIG parameter is defined and the RERUNNABLE parameter is set to yes in the lsb.queues or lsb.applications file), and GPU jobs can only be preempted by other GPU jobs.
When the LSB_GPU_NEW_SYNTAX=extend parameter is set,
you can specify the gmem, gmodel,
gtile, glink, and gvendor GPU
requirements with the bsub -gpu option, in addition to the GPU requirements when
LSB_GPU_NEW_SYNTAX=Yis set; that is, j_exclusive,
mode, mps, and num GPU
requirements. You can also specify GPU resource requirements with the GPU_REQ
parameter in a queue (lsb.queues file), application profile
(lsb.applications file), or a default GPU requirement with the
LSB_GPU_REQ parameter (lsf.conf file). The resource requirements of your job submission cannot use the legacy GPU resources
(ngpus_shared, ngpus_excl_t,
ngpus_excl_p) as job resource requirements. In addition, if the
PREEMPTABLE_RESOURCES parameter in the lsb.params file
includes the ngpus_physical resource, GPU preemption is enabled with only one
restriction: higher priority GPU jobs cannot preempt GPU jobs with
mode=shared configuration in the GPU resource requirements if there are
multiple jobs running on the GPU. (Note that as of
Fix Pack 14, this restriction has been removed so that higher priority GPU jobs with
j_exclusive=yes
or mode=exclusive_process
settings can preempt
shared-mode GPU jobs if there were multiple jobs running on the GPU.) Ensure that you properly
configure the MIG, RERUNNABLE, or
REQUEUE parameters to ensure that GPU resources are properly released after the
job is preempted.
When the LSB_GPU_NEW_SYNTAX=N parameter is set or the parameter is not specified, you must use the GPU resources ngpus_shared, ngpus_excl_t and ngpus_excl_p as GPU resource requirements.
Job-level GPU requirements override the application profile level, which overrides queue level, which overrides cluster level configuration.
There are some situations where you may want to use the bsub -R command option to specify GPU requirements instead of the bsub -gpu command option. These complex situations include the need to use compound or alternative resource requirements to specify your GPU requirements. For example, a GPU job might be available to run on either K40 or K80 GPUs. You cannot use bsub -gpu syntax to specify this type of selection and must instead use the bsub -R syntax to specify this selection using alternative resource requirements.
Since j_exclusive, mode, and mps GPU requirements are not supported in the bsub -R syntax (including the RES_REQ parameter), you must use the bsub -gpu syntax (including the GPU_REQ or LSB_GPU_REQ parameters). For these three GPU requirements, you can use the bsub -R syntax and the bsub -gpu syntax in the same job submission.
If you use the bsub -R syntax with the bsub -gpu syntax for any other GPU requirement at the job, application profile, or queue level (including gmem, gmodel, gtile, or nvlink), LSF rejects the job submission.
Setting the LSB_GPU_NEW_SYNTAX parameter to Y or extend also enables GPU resource preemption if you also configure the PREEMPTABLE_RESOURCES parameter in the lsb.params file to include the ngpus_physical resource. GPU jobs must be either using exclusive_process mode or have j_exclusive=yes set to be preempted by other GPU jobs. Non-GPU jobs cannot preempt GPU jobs.
Default
extend
See also
- GPU_REQ
- LSB_GPU_REQ
- bsub -gpu