GPU enhancements

The following enhancements affect LSF GPU support.

Enable MPS daemon sharing for GPU jobs

LSF now allows you to share an NVIDIA Multi-Process Service (MPS) daemon for multiple GPU jobs if they are submitted by the same user with the same resource requirements.

To enable MPS daemon sharing, add the ",share" keyword to the existing mps value in the GPU resource requirements string (that is, the bsub -gpu command option, LSB_GPU_REQ parameter in the lsf.conf file, and the GPU_REQ parameter in the lsb.queues and lsb.applications files). Specify mps=yes,share, mps=per_socket,share, or mps=per_gpu,share in the GPU requirements to enable LSF to share the MPS daemon on the host, socket, or GPU for jobs that are submitted by the same user with the same resource requirements.

LSB_GPU_NEW_SYNTAX=extend must be defined in the lsf.conf file to enable MPS daemons and MPS daemon sharing.

Merge individual options in GPU requirements

In previous versions of LSF, when specifying GPU resource requirements at multiple levels (that is, at the job level with the bsub -gpu command option, the application or queue level with the GPU_REQ parameter in the lsb.applications or lsb.queues files, or the cluster level with the LSB_GPU_REQ parameter in the lsf.conf file), the entire GPU requirement string overrides the GPU requirement strings at the lower levels of precedence. Even if an individual option is not specified in the higher level GPU requirement string, the default value of the higher level GPU requirement string takes precedence.

LSF now allows you to merge all individual options in GPU requirement strings separately. Any specified options override the any options that are specified at the lower levels of precedence. If an individual option is not specified, but is explicitly specified at a lower level, then the highest level for which the option is specified takes precedence. To enable this feature, specify the new parameter GPU_REQ_MERGE=Y in the lsb.params file. In addition, LSB_GPU_NEW_SYNTAX=extend must be defined in the lsf.conf file to enable the GPU_REQ_MERGE parameter.

Specify GPU allocation method

You can now specify the GPU resource reservation method by specifying the method in the bsub -gpu option, by specifying the GPU_REQ parameter in the lsb.applications or lsb.queues file, or by specifying the LSB_GPU_REQ parameter in the lsf.conf file. Previously, you could only specify the resource reservation method at the global level by specifying the METHOD parameter in the ReservationUsage section of the lsb.resources file. The GPU resource reservation value and method at the job level overrides the value and method at the application level, which overrides the value and method at the queue level, which overrides the value and method at the cluster level.

Specify the GPU resource reservation method by using the /task or /host keyword after the GPU numeric value in the GPU requirements string. Specify the GPU resource reservation methods as follows:

  • value/task

    Specifies GPUs per-task reservation. This is the equivalent of specifying PER_TASK for the METHOD parameter in the ReservationUsage section of the lsb.resources file.

  • value/host

    Specifies GPUs per-host reservation of the specified resource. This is the equivalent of specifying PER_HOST for the METHOD parameter in the ReservationUsage section of the lsb.resources file.

For example, GPU_REQ="num=2/task:mode=shared:j_exclusive=yes"

Support for NVIDIA DGX systems

LSF now supports NVIDIA DGX-1 and DGX-2 systems. LSF automatically detects and configures NVIDIA GPU support.