LSB_GPU_NEW_SYNTAX

Enables the bsub -gpu option to submit jobs that require GPU resources. Also enables the specification of GPU resources with the GPU_REQ parameter in the lsb.queues and lsb.applications file or the default GPU requirement configured in the LSB_GPU_REQ parameter in the lsf.conf file.

Syntax

LSB_GPU_NEW_SYNTAX = Y | y | N | n | extend

Description

When this parameter is enabled (that is, set as Y or extend), you can use the bsub -gpu command, GPU_REQ parameter (in the lsb.queues and lsb.applications file), or LSB_GPU_REQ parameter (in the lsf.conf file) to specify your GPU requirements. Use these options and parameters to specify GPU requirements instead of using the bsub -R command or RES_REQ parameter in the lsb.queues file

In addition, if LSF_GPU_RESOURCE_IGNORE is set to Y and LSB_GPU_NEW_SYNTAX is set to Y or extend while LSB_GPU_AUTOCONFIG=Y is enabled, all built-in GPU resources (gpu_<num>n) are completely removed from the management host LIM. LSF uses a different method for the management host LIM and server host LIMs to collect GPU information. This improves LSF response time because there are fewer LSF resource metrics to configure and collect. Do not set LSF_GPU_RESOURCE_IGNORE=Y if you are using LSF RTM unless you are running LSF RTM, Version 10.2 Fix Pack 11, or later.

When the LSB_GPU_NEW_SYNTAX=Y parameter is set, you can specify the j_exclusive, mode, mps, and num GPU requirements with the bsub -gpu option. You can also specify GPU resource requirements with the GPU_REQ parameter in a queue (lsb.queues file), application profile (lsb.applications file), or a default GPU requirement with the LSB_GPU_REQ parameter (lsf.conf file). The resource requirements of your job submission cannot use the legacy GPU resources (ngpus_shared, ngpus_excl_t, ngpus_excl_p) as job resource requirements. In addition, if the PREEMPTABLE_RESOURCES parameter in the lsb.params file includes the ngpus_physical resource, GPU preemption is enabled with restrictions: GPU jobs can only be preempted if they have mode=exclusive_process or j_exclusive=yes specified in the GPU resource requirements and are configured for automatic job migration and rerun (that is, the MIG parameter is defined and the RERUNNABLE parameter is set to yes in the lsb.queues or lsb.applications file), and GPU jobs can only be preempted by other GPU jobs.

When the LSB_GPU_NEW_SYNTAX=extend parameter is set, you can specify the gmem, gmodel, gtile, glink, and gvendor GPU requirements with the bsub -gpu option, in addition to the GPU requirements when LSB_GPU_NEW_SYNTAX=Y is set (that is, j_exclusive, mode, mps, and num GPU requirements). You can also specify GPU resource requirements with the GPU_REQ parameter in a queue (lsb.queues file), application profile (lsb.applications file), or a default GPU requirement with the LSB_GPU_REQ parameter (lsf.conf file). The resource requirements of your job submission cannot use the legacy GPU resources (ngpus_shared, ngpus_excl_t, ngpus_excl_p) as job resource requirements. In addition, if the PREEMPTABLE_RESOURCES parameter in the lsb.params file includes the ngpus_physical resource, GPU preemption is enabled with only one restriction: higher priority GPU jobs cannot preempt GPU jobs with mode=shared configuration in the GPU resource requirements if there are multiple jobs running on the GPU. (Note that as of Fix Pack 14, this restriction has been removed so that higher priority GPU jobs with j_exclusive=yes or mode=exclusive_process settings can preempt shared-mode GPU jobs if there were multiple jobs running on the GPU.) Ensure that you properly configure the MIG, RERUNNABLE, or REQUEUE parameters to ensure that GPU resources are properly released after the job is preempted.

When the LSB_GPU_NEW_SYNTAX=N parameter is set or the parameter is not specified, you must use the GPU resources ngpus_shared, ngpus_excl_t and ngpus_excl_p as GPU resource requirements.

Job-level GPU requirements override the application profile level, which overrides queue level, which overrides cluster level configuration.

There are some situations where you may want to use the bsub -R command option to specify GPU requirements instead of the bsub -gpu command option. These complex situations include the need to use compound or alternative resource requirements to specify your GPU requirements. For example, a GPU job might be available to run on either K40 or K80 GPUs. You cannot use bsub -gpu syntax to specify this type of selection and must instead use the bsub -R syntax to specify this selection using alternative resource requirements.

Since j_exclusive, mode, and mps GPU requirements are not supported in the bsub -R syntax (including the RES_REQ parameter), you must use the bsub -gpu syntax (including the GPU_REQ or LSB_GPU_REQ parameters). For these three GPU requirements, you can use the bsub -R syntax and the bsub -gpu syntax in the same job submission.

If you use the bsub -R syntax with the bsub -gpu syntax for any other GPU requirement at the job, application profile, or queue level (including gmem, gmodel, gtile, or nvlink), LSF rejects the job submission.

Setting the LSB_GPU_NEW_SYNTAX parameter to Y or extend also enables GPU resource preemption if you also configure the PREEMPTABLE_RESOURCES parameter in the lsb.params file to include the ngpus_physical resource. GPU jobs must be either using exclusive_process mode or have j_exclusive=yes set to be preempted by other GPU jobs. Non-GPU jobs cannot preempt GPU jobs.

Note: When LSF is running on RHEL systems, RHEL Version 7, or later, is required to support the LSB_GPU_NEW_SYNTAX parameter.

Default

extend

See also

  • GPU_REQ
  • LSB_GPU_REQ
  • bsub -gpu