NVIDIA GPU access enforcement
LSF can enforce NVIDIA GPU access on systems that support the Linux cgroup devices subsystem. To enable GPU access through Linux cgroups, configure the LSB_RESOURCE_ENFORCE="gpu" parameter in the lsf.conf file. LSF creates devices to contain job processes if the job has GPU resource requirements so that the job processes cannot escape from the allocated GPUs. Each GPU job device includes only the GPUs that LSF distributes. Linux cgroup devices are only created for GPU jobs. GPU enforcement is available for NVIDIA GPUs; it is not supported on AMD GPUs.
GPU enforcement for Linux cgroup device subsystems is supported on Red Hat Enterprise Linux 6.2 and later, and SuSe Linux Enterprise Linux 11 SP2 and later.
Note: When GPU enforcement is enabled, the GPUs that are contained in one device cgroup are
reallocated new GPU IDs, beginning with 0. CUDA Version 7.0 or later supports cgroup
completely.
Jobs can specify how job processes are to be bound to these computing elements. When LSF
allocates GPUs for a job, it sets the CUDA_VISIBLE_DEVICESenvironment variable
to inform the job which GPUs are allocated. However, jobs can escape this and use GPUs not allocated
to the job. To avoid this problem, use cgroups and the
LSB_RESOURCE_ENFORCE="gpu" setting to restrict the GPUs that a job's processes
may be able to access:
- If cgroups are disabled or not available, or LSB_RESOURCE_ENFORCE="gpu" is not set, then LSF will only set the CUDA_VISIBLE_DEVICES environment variable for the job environment. A malicious job could overwrite the environment variable and use GPUs that are not assigned to it by LSF.
- If LSB_RESOURCE_ENFORCE="gpu" and the host system is using cgroups v1, then LSF will use cgroups to limit the job processes access to only the assigned GPUs. The CUDA_VISIBLE_DEVICES environment variable is set.
- If LSB_RESOURCE_ENFORCE="gpu" and the host system is using cgroups v2, LSF will use $LSF_SERVERDIR/disable_device to restrict the job process's access.