LSB_START_MPS

LSF starts CUDA MPS (Multi-Process Service) for the GPU jobs that require only GPUs with EXCLUSIVE_PROCESS or DEFAULT modes.

Syntax

LSB_START_MPS=Y|y|N|n

Description

CUDA MPS allows multiple CUDA processes to share a single GPU context with these modes. If the job requires GPUs with EXCLUSIVE_THREAD mode, LSF does not start CUDA MPS for the GPU jobs.

Remember: When you change the value of the LSB_START_MPS parameter, you must restart the sbatchd daemon:
bctrld restart sbd all

When LSF starts MPS for a job, LSF sets CUDA_MPS_PIPE_DIRECTORY instead of CUDA_VISIBLE_DEVICES. The GPU jobs communicate with MPS through a named pipe that is defined by CUDA_MPS_PIPE_DIRECTORY. CUDA_MPS_PIPE_DIRECTORY is stored under the directory that is specified by LSF_TMPDIR. When the job finishes, LSF removes the pipe.

If the cgroup feature is enabled, LSF also creates a cgroup for MPS under the job level cgroup.

The MPS Server supports up to 16 client CUDA contexts concurrently. This limitation is per user per job and means that MPS can only support up to16 CUDA processes at one time even if LSF allocated multiple GPUs. MPS cannot exit normally if GPU jobs are killed. The LSF cgroup feature can help resolve this situation.

The MPS function is supported by CUDA Version 5.5, or later.

The LSB_START_JOB_MPS environment variable at the job level overrides the LSB_START_MPS parameter.

The default value of the MPS option is mps=no in the GPU requirement syntax. This option overrides the parameters LSB_START_MPS = y in the lsf.conf file and the LSB_START_JOB_MPS=y environment variable, and bsub -env LSB_START_JOB_MPS=y command option.

Default

N - LSF does not start CUDA MPS for GPU jobs.