Configuring GPU run time


  1. Set a value for the GPU_RUN_TIME_FACTOR parameter for the queue in lsb.queues or for the cluster in lsb.params.
  2. To enable historical GPU run time of finished jobs, specify ENABLE_GPU_HIST_RUN_TIME=Y for the queue in lsb.queues or for the cluster in lsb.params.

    Enabling historical GPU time ensures that the user's priority does not increase significantly after a GPU job finishes.


If you set the GPU run time factor and enabled the use of GPU historical run time, the dynamic priority is calculated according to the following formula:

dynamic priority = number_shares / (cpu_time * CPU_TIME_FACTOR + (historical_run_time + run_time) * RUN_TIME_FACTOR + (committed_run_time - run_time) * COMMITTED_RUN_TIME_FACTOR + (1 + job_slots) * RUN_JOB_FACTOR + fairshare_adjustment(struct* shareAdjustPair)*FAIRSHARE_ADJUSTMENT_FACTOR) + ((historical_gpu_run_time + gpu_run_time) * ngpus_physical) * GPU_RUN_TIME_FACTOR

For historical_run_time, if ENABLE_GPU_HIST_RUN_TIME is defined in the lsb.params file, the historical_run_time is the same as the job's run time (measured in hours) of finished GPU jobs, and a decay factor from time to time based on HIST_HOURS in the lsb.params file (5 hours by default).

Note that:
  • For jobs that ask for exclusive use of a GPU, gpu_run_time is the same as the job's run time and ngpus_physical is the value of the requested ngpus_physical in the job's effective RES_REQ string.
  • For jobs that ask for an exclusive host (with the bsub -x option), the gpu_run_time is the same as the job's run time and ngpus_physical is the number of GPUs on the execution host.
  • For jobs that ask for an exclusive compute unit (bsub -R "cu[excl]" option), the gpu_run_time is the same as the job's run time and ngpus_physical is the number of GPUs or all the execution hosts in the compute unit.
  • For jobs that ask for shared mode GPUs, these jobs do not contribute to dynamic user priority calculations. They do not get charged for fair sharing.
The gpu_run_timevalue is the run time requested at GPU job submission with the -gpu option of bsub, the queue or application profile configuration with the GPU_REQ parameter, or the cluster configuration with the LSB_GPU_REQ parameter.