GPU access enforcement

LSF can enforce GPU access on systems that support the Linux cgroup devices subsystem. To enable GPU access through Linux cgroups, configure the LSB_RESOURCE_ENFORCE="gpu" parameter in the lsf.conf file. LSF creates devices to contain job processes if the job has GPU resource requirements so that the job processes cannot escape from the allocated GPUs. Each GPU job device includes only the GPUs that LSF distributes. Linux cgroup devices are only created for GPU jobs.

GPU enforcement for Linux cgroup device subsystems is supported on Red Hat Enterprise Linux 6.2 and later, and SuSe Linux Enterprise Linux 11 SP2 and later.

Note:
  • GPU enforcement is not supported on AMD GPUs.
  • When GPU enforcement is enabled, the GPUs that are contained in one device cgroup are reallocated new GPU IDs, beginning with 0. CUDA Version 7.0 or later supports cgroup completely.

Jobs can specify how job processes are to be bound to these computing elements. LSF uses the environment variable CUDA_VISIBLE_DEVICES to tell user applications which GPUs are allocated. It is possible for user applications to escape from the allocated GPUs by changing the CUDA_VISIBLE_DEVICES variable to use other GPUs directly.

For example, the following command submits a job with one exclusive thread GPU requirement:

bsub -R "rusage[ngpus_excl_t=1]"./myapp 

LSF creates a device that contains one exclusive thread GPU and attaches the process ID of the application ./myapp to this device. The device serves as a strict container for job processes so that the application ./myapp cannot use other GPUs.