]>

Jobs have a long run time in LSF when CPU frequency governors are enabled, but run faster when executed directly on the machine

Some CPU frequency governors set the processor clock speed lower for LSF jobs, but faster for the applications executed on the machine directly. This occurs because some governors do not consider processes that call nice() when adjusting for a faster clock speed. However, the current default behavior in LSF is that all jobs are call nice(). Therefore, LSF jobs will have a longer run time than jobs that run directly on the machine. For example: A CPU-intensive application may run for six days as a job in LSF, but may run for only three days directly on the host.

The following CPU frequency governors may not consider nice() processes:

  • Ondemand and Conservative: These governors have a feature to count nice() processes when considering clock speed adjustments. The sysfs file accessible parameter ignore_nice_load accepts a value of '0' or '1'. When set to '0' (the default), all processes are counted towards the CPU utilization value. When set to '1', the processes that are run with a nice value will not count in the overall usage calculation and are ignored.

  • Userspace: Using cpuspeed as an example, you can disable the feature that prevents nice() processes from increasing CPU speed. Use cpuspeed -n to disable the feature and choose “do not treat niced programs as idle time”. Check related Userspace documentation for details.

By default, the LSF queues template includes NICE values for each queue. The NICE value is set for the job on execution hosts by calling nice() for all jobs. LSF jobs will not increase CPU frequency if the CPU frequency configuration is set to ignore nice() processes. In this case, LSF jobs will run for a longer time and will not use the full CPU capacity of the host.

 

Optimizing LSF job processing

  For high performance on job throughput:

  1. Do not nice() jobs in the queue.

Remove NICE in queue configuration. LSF jobs will run as common processes on the execution hosts. They will have the same impact to CPU frequency governors even if configured to ignore nice() jobs.

  1. Let CPU frequency governors consider nice() jobs for scaling processor clock speed.

The parameter ignore_nice_load is located in /sys/devices/system/cpu/cpuX/cpufreq/ for Ondemand and Conservative frequency governors. Set ignore_nice_load to 0 to have nice() jobs have the same scaling as non-nice() jobs.

For third party Userspace governors, (for example cpuspeed), you can use cpuspeed –n to make nice() jobs have the same scaling as non-nice() jobs.

 

For power consumption:

  • Set CPU frequency governors to ignore nice() jobs and allow LSF jobs to become nice() by configuring the NICE value in related queues.
  • To balance performance with power saving by CPU frequency, you should also configure automatic CPU frequency selection. See Administering Platform LSF for more information.

References for CPU frequency governors