LSB_JOB_CPULIMIT
Determines whether the CPU limit is a per-process limit enforced by the OS or whether it is a per-job limit enforced by LSF.
Syntax
LSB_JOB_CPULIMIT=y | n
Description
- The per-process limit is enforced by the OS when the CPU time of one process of the job exceeds the CPU limit.
- The per-job limit is enforced by LSF when the total CPU time of all processes of the job exceed the CPU limit.
This parameter applies to CPU limits set when a job is submitted with bsub
-c, and to CPU limits set for queues by the CPULIMIT parameter
in the lsb.queues file.
- LSF-enforced per-job limit
- When the sum of the CPU time of all processes of a job exceed the CPU limit, LSF sends a SIGXCPU signal (if this signal is supported by the operating system) from the
operating system to all processes belonging to the job, then SIGINT, SIGTERM and SIGKILL.
The interval between signals is 10 seconds by default. The time interval between SIGXCPU,
SIGINT, SIGKILL, SIGTERM can be configured with the parameter
JOB_TERMINATE_INTERVAL in the lsb.params
file.Restriction:
SIGXCPU is not supported by Windows.
- OS-enforced per process limit
- When one process in the job exceeds the CPU limit, the limit is enforced by the operating system. For more details, refer to your operating system documentation for setrlimit().
The setting of the LSB_JOB_CPULIMIT parameter has the
following effect on how the limit is enforced:
LSB_JOB_CPULIMIT | LSF per-job limit | OS per-process limit |
---|---|---|
y | Enabled | Disabled |
n | Disabled | Enabled |
Not defined | Enabled | Enabled |
Default
Not defined
Notes
To make changes to the LSB_JOB_CPULIMIT parameter take effect, use the command bctrld restart sbd all to restart all sbatchds in the cluster.
Changing the default Terminate job control action
You can define a different terminate action in the lsb.queues file with the parameter JOB_CONTROLS if you do not want the job to be killed. For more details on job controls, see Administering IBM® Spectrum LSF.
Limitations
If a job is running and the parameter is changed, LSF
is not able to reset the type of limit enforcement for running jobs.
- If the parameter is changed from per-process limit enforced by the OS to per-job limit enforced by LSF (LSB_JOB_CPULIMIT=n changed to LSB_JOB_CPULIMIT=y), both per-process limit and per-job limit affect the running job. This means that signals may be sent to the job either when an individual process exceeds the CPU limit or the sum of the CPU time of all processes of the job exceed the limit. A job that is running may be killed by the OS or by LSF.
- If the parameter is changed from per-job limit enforced by LSF to per-process limit enforced by the OS (LSB_JOB_CPULIMIT=y changed to LSB_JOB_CPULIMIT=n), the job is allowed to run without limits because the per-process limit was previously disabled.
See also
lsb.queues, bsub, JOB_TERMINATE_INTERVAL in lsb.params, LSB_MOD_ALL_JOBS