LSB_JOB_CPULIMIT

Determines whether the CPU limit is a per-process limit enforced by the OS or whether it is a per-job limit enforced by LSF.

Syntax

LSB_JOB_CPULIMIT=y | n

Description

  • The per-process limit is enforced by the OS when the CPU time of one process of the job exceeds the CPU limit.
  • The per-job limit is enforced by LSF when the total CPU time of all processes of the job exceed the CPU limit.
This parameter applies to CPU limits set when a job is submitted with bsub -c, and to CPU limits set for queues by the CPULIMIT parameter in the lsb.queues file.
LSF-enforced per-job limit
When the sum of the CPU time of all processes of a job exceed the CPU limit, LSF sends a SIGXCPU signal (if this signal is supported by the operating system) from the operating system to all processes belonging to the job, then SIGINT, SIGTERM and SIGKILL. The interval between signals is 10 seconds by default. The time interval between SIGXCPU, SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in the lsb.params file.
Restriction:

SIGXCPU is not supported by Windows.

OS-enforced per process limit
When one process in the job exceeds the CPU limit, the limit is enforced by the operating system. For more details, refer to your operating system documentation for setrlimit().
The setting of the LSB_JOB_CPULIMIT parameter has the following effect on how the limit is enforced:
LSB_JOB_CPULIMIT LSF per-job limit OS per-process limit
y Enabled Disabled
n Disabled Enabled
Not defined Enabled Enabled

Default

Not defined

Notes

To make changes to the LSB_JOB_CPULIMIT parameter take effect, use the command bctrld restart sbd all to restart all sbatchds in the cluster.

Changing the default Terminate job control action

You can define a different terminate action in the lsb.queues file with the parameter JOB_CONTROLS if you do not want the job to be killed. For more details on job controls, see Administering IBM® Spectrum LSF.

Limitations

If a job is running and the parameter is changed, LSF is not able to reset the type of limit enforcement for running jobs.
  • If the parameter is changed from per-process limit enforced by the OS to per-job limit enforced by LSF (LSB_JOB_CPULIMIT=n changed to LSB_JOB_CPULIMIT=y), both per-process limit and per-job limit affect the running job. This means that signals may be sent to the job either when an individual process exceeds the CPU limit or the sum of the CPU time of all processes of the job exceed the limit. A job that is running may be killed by the OS or by LSF.
  • If the parameter is changed from per-job limit enforced by LSF to per-process limit enforced by the OS (LSB_JOB_CPULIMIT=y changed to LSB_JOB_CPULIMIT=n), the job is allowed to run without limits because the per-process limit was previously disabled.

See also

lsb.queues, bsub, JOB_TERMINATE_INTERVAL in lsb.params, LSB_MOD_ALL_JOBS