Decreasing GPU power consumption when a GPU is not in use
A GPU consumes significant power even when it idles. LSF provides configuration parameters to decrease the GPU power that is consumed if GPU is not in use within a specified time. By default, LSF does not power off a GPU even when it is idle.
Set the LSB_GPU_POWEROFF_DURATION parameter in the
lsf.conf file to specify the minimum number of seconds before LSF can
power off an idle GPU. When the LSB_GPU_POWEROFF_DURATION parameter is set,
LSF tries to allocate the GPU that is not running in MIN power limit mode. If not
enough GPUs are in MAX power limit mode, LSF
allocates the GPUs that are in MIN power limit mode and switches those GPUs to run
in MAX power limit mode.
If the LSB_GPU_POWEROFF_DURATION=0 configuration is set, LSF powers off GPUs immediately after the job finishes.
- All GPUs are in the same PCI.
- Check whether the
MAX power limitmode GPUs meets job requirements. If they do, LSF does not allocate theMIN power limitmode GPUs first. If they do not meet the requirements, LSF allocates all the GPUs to the job, including those in bothMAX power limitandMIN power limitmodes.
If the sbatchd daemon is restarted, the GPU idle time is recalculated.
NVIDIA K80 hardware supports switch power limits. The NVML library must be version 6.340 or newer.