How job limits work
The JOBS parameter limits the maximum number of running or suspended jobs available to resource consumers. Limits are enforced depending on the number of jobs in RUN, SSUSP, and USUSP state.
Stop and resume jobs
Jobs stopped with bstop, go into USUSP status. LSF includes USUSP jobs in the count of running jobs, so the usage of JOBS limit will not change when you suspend a job.
Resuming a stopped job (bresume) changes job status to SSUSP. The job can enter RUN state, if the JOBS limit has not been exceeded. Lowering the JOBS limit before resuming the job can exceed the JOBS limit, and prevent SSUSP jobs from entering RUN state.
For example, JOBS=5, and 5 jobs are running in the cluster (JOBS has reached 5/5). Normally. the stopped job (in USUSP state) can later be resumed and begin running, returning to RUN state. If you reconfigure the JOBS limit to 4 before resuming the job, the JOBS usage becomes 5/4, and the job cannot run because the JOBS limit has been exceeded.
Preemption
The JOBS limit does not block preemption based on job slots. For example, if JOBS=2, and a host is already running 2 jobs in a preemptable queue, a new preemptive job can preempt a job on that host as long as the preemptive slots can be satisfied even though the JOBS limit has been reached.
Reservation and backfill
Reservation and backfill are still made at the job slot level, but despite a slot reservation being satisfied, the job may ultimately not run because the JOBS limit has been reached.
Other jobs
- brun forces a pending job to run immediately on specified hosts. A job forced to run with brun is counted as a running job, which may violate JOBS limits. After the forced job starts, the JOBS limits may be exceeded.
- Re-queued jobs (brequeue) are assigned PEND status or PSUSP. Usage of JOBS limit is decreased by the number of re-queued jobs.
- Checkpointed jobs restarted with brestart start a new job based on the checkpoint of an existing job. Whether the new job can run depends on the limit policy (including the JOBS limit) that applies to the job. For example, if you checkpoint a job running on a host that has reached its JOBS limit, then restart it, the restarted job cannot run because the JOBS limit has been reached.
- For job arrays, you can define a maximum number of jobs that can run in the array at any given time. The JOBS limit, like other resource allocation limits, works in combination with the array limits. For example, if JOBS=3 and the array limit is 4, at most 3 job elements can run in the array.
- For chunk jobs, only the running job among the jobs that are dispatched together in a chunk is counted against the JOBS limit. Jobs in WAIT state do not affect the JOBS limit usage.
Example limit configurations
Each set of limits is defined in a Limit section enclosed by Begin Limit and End Limit.
Example 1
user1 is limited to 2 job slots on hostA, and user2’s jobs on queue normal are limited to 20 MB of memory:
Begin Limit
NAME HOSTS SLOTS MEM SWP TMP USERS QUEUES
Limit1 hostA 2 - - - user1 -
- - - 20 - - user2 normal
End Limit
Example 2
Set a job slot limit of 2 for user user1 submitting jobs to queue normal on host hosta for all projects, but only one job slot for all queues and hosts for project test:
Begin Limit
HOSTS SLOTS PROJECTS USERS QUEUES
hosta 2 - user1 normal
- 1 test user1 -
End Limit
Example 3
All users in user group ugroup1 except user1 using queue1 and queue2 and running jobs on hosts in host group hgroup1 are limited to 2 job slots per processor on each host:
Begin Limit
NAME = limit1
# Resources:
SLOTS_PER_PROCESSOR = 2
#Consumers:
QUEUES = queue1 queue2
USERS = ugroup1 ~user1
PER_HOST = hgroup1
End Limit
Example 4
user1 and user2 can use all queues and all hosts in the cluster with a limit of 20 MB of available memory:
Begin Limit
NAME = 20_MB_mem
# Resources:
MEM = 20
# Consumers:
USERS = user1 user2
End Limit
Example 5
All users in user group ugroup1 can use queue1 and queue2 and run jobs on any host in host group hgroup1 sharing 10 job slots:
Begin Limit
NAME = 10_slot
# Resources:
SLOTS = 10
#Consumers:
QUEUES = queue1 queue2
USERS = ugroup1
HOSTS = hgroup1
End Limit
Example 6
All users in user group ugroup1 except user1 can use all queues but queue1 and run jobs with a limit of 10% of available memory on each host in host group hgroup1:
Begin Limit
NAME = 10_percent_mem
# Resources:
MEM = 10%
QUEUES = all ~queue1
USERS = ugroup1 ~user1
PER_HOST = hgroup1
End Limit
Example 7
Limit users in the develop group to 1 job on each host, and 50% of the memory on the host.
Begin Limit
NAME = develop_group_limit
# Resources:
SLOTS = 1
MEM = 50%
#Consumers:
USERS = develop
PER_HOST = all
End Limit
Example 8
Limit all hosts to 1 job slot per processor:
Begin Limit
NAME = default_limit
SLOTS_PER_PROCESSOR = 1
PER_HOST = all
End Limit
Example 9
The short queue can have at most 200 running and suspended jobs:
Begin Limit
NAME = shortq_limit
QUEUES = short
JOBS = 200
End Limit