Nvidia Multi-Instance GPU (MIG) features

Use the LSF_MANAGE_MIG parameter in the lsf.conf file to enable dynamic MIG scheduling.

Nvidia Multi-Instance GPU (MIG) features allow a single supported GPU to be securely partitioned into up to seven independent GPU instances, providing multiple users with independent GPU resources.

When dynamic MIG scheduling is enabled, LSF dynamically creates GPU instances (GI) and compute instances (CI) on each host, and LSF controls the MIG of each host. If you enable dynamic MIC scheduling, do not manually create or destroy MIG devices outside of LSF. Set the LSF_MANAGE_MIG parameter to Y in the lsf.conf file to enable dynamic MIG scheduling.

Starting in Fix Pack 14, additionally, LSF leverages cgroups to enforce MIG device isolation. Set this enforcement by configuring LSB_RESOURCE_ENFORCE="gpu" in the lsf.conf file.

If LSF_MANAGE_MIG is set to N or is undefined, LSF uses static MIG scheduling. LSF allocates the GI and CI based on the configuration of each MIG host, and dispatches jobs to the MIG hosts. LSF does not create or destroy the GI and CI on the MIG hosts. If you use static MIG scheduling and want to change MIG devices, you must wait for the running MIG job to finish, then destroy the existing MIG device, create a new MIG device, and restart the LSF daemons.

If you change the value of this parameter, you must wait until all MIG jobs that are running on a cluster are done, then restart the LSF daemons for your changes take effect.

After changing the value of this parameter, you must restart the LSF daemons for your changes to take effect.