How resource allocation limits work

By default, resource consumers like users, hosts, queues, or projects are not limited in the resources available to them for running jobs.

Resource allocation limits configured in lsb.resources specify the following restrictions:

  • The maximum amount of a resource requested by a job that can be allocated during job scheduling for different classes of jobs to start
  • Which resource consumers the limits apply to

If all of the resource has been consumed, no more jobs can be started until some of the resource is released.

For example, by limiting maximum amount of memory for each of your hosts, you can make sure that your system operates at optimal performance. By defining a memory limit for some users submitting jobs to a particular queue and a specified set of hosts, you can prevent these users from using up all the memory in the system at one time.

Jobs must specify resource requirements

For limits to apply, the job must specify resource requirements (bsub -R rusage string or the RES_REQ parameter in the lsb.queues file). For example, the a memory allocation limit of 4 MB is configured in lsb.resources:

Begin Limit
NAME = mem_limit1
MEM = 4
End Limit

A is job submitted with an rusage resource requirement that exceeds this limit:

bsub -R "rusage[mem=5]" uname

and remains pending:

bjobs -p 600
JOBID  USER   STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME        SUBMIT_TIME
 600    user1  PEND  normal   suplin02                uname       Aug 12 14:05
Resource (mem) limit defined cluster-wide has been reached;

A job is submitted with a resource requirement within the configured limit:

bsub -R"rusage[mem=3]" sleep 100

is allowed to run:

bjobs
JOBID   USER   STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME      SUBMIT_TIME
  600    user1   PEND  normal      hostA                uname      Aug 12 14:05
  604    user1    RUN  normal      hostA            sleep 100      Aug 12 14:09

Resource usage limits and resource allocation limits

Resource allocation limits are not the same as resource usage limits, which are enforced during job run time. For example, you set CPU limits, memory limits, and other limits that take effect after a job starts running.

Resource reservation limits and resource allocation limits

Resource allocation limits are not the same as queue-based resource reservation limits, which are enforced during job submission. The parameter RESRSV_LIMIT in the lsb.queues file specifies allowed ranges of resource values, and jobs submitted with resource requests outside of this range are rejected.

How LSF enforces limits

Resource allocation limits are enforced so that they apply to all jobs in the cluster according to the kind of resources, resource consumers, and combinations of consumers.

  • All jobs in the cluster
  • Several kinds of resources:
    • Job slots by host
    • Job slots per processor
    • Running and suspended jobs
    • Memory (MB or percentage)
    • Swap space (MB or percentage)
    • Tmp space (MB or percentage)
    • Other shared resources
  • Several kinds of resource consumers:
    • Users and user groups (all users or per-user)
    • Hosts and host groups (all hosts or per-host)
    • Queues (all queues or per-queue)
    • Projects (all projects or per-project)
  • Combinations of consumers:
    • For jobs running on different hosts in the same queue
    • For jobs running from different queues on the same host

How LSF counts resources

Resources on a host are not available if they are taken by jobs that have been started, but have not yet finished. This means running and suspended jobs count against the limits for queues, users, hosts, projects, and processors that they are associated with.

Job slot limits

Job slot limits can correspond to the maximum number of jobs that can run at any point in time. For example, a queue cannot start jobs if it has no job slots available, and jobs cannot run on hosts that have no available job slots.

Limits such as QJOB_LIMIT (lsb.queues), HJOB_LIMIT (lsb.queues), UJOB_LIMIT (lsb.queues), MXJ (lsb.hosts), JL/U (lsb.hosts), MAX_JOBS (lsb.users), and MAX_PEND_SLOTS (lsb.users and lsb.params ) limit the number of job slots. When the workload is sequential, job slots are usually equivalent to jobs. For parallel or distributed applications, these are true job slot limits and not job limits.

Job limits

Job limits, specified by JOBS in a Limit section in lsb.resources, correspond to the maximum number of running and suspended jobs that can run at any point in time. MAX_PEND_JOBS (lsb.users and lsb.params) limit the number of pending jobs. If both job limits and job slot limits are configured, the most restrictive limit is applied.

Resource reservation and backfill

When processor or memory reservation occurs, the reserved resources count against the limits for users, queues, hosts, projects, and processors. When backfilling of parallel jobs occurs, the backfill jobs do not count against any limits.

IBM® Spectrum LSF multicluster capability

Limits apply only to the cluster where the lsb.resources file is configured. If the cluster leases hosts from another cluster, limits are enforced on those hosts as if they were local hosts.

Switched jobs can exceed resource allocation limits

If a switched job (the bswitch command) has not been dispatched, then the job behaves as if it were submitted to the new queue in the first place, and the JOBS limit is enforced in the target queue.

If a switched job has been dispatched, then resource allocation limits like SWP. TMP. and JOBS can be exceeded in the target queue. For example, given the following JOBS limit configuration:
Begin Limit
USERS     QUEUES      SLOTS   TMP    JOBS 
-         normal        -      20     2
-         short         -      20     2
End Limit
Submit 3 jobs to the normal queue, and 3 jobs to the short queue:
bsub -q normal -R"rusage[tmp=20]" sleep 1000
bsub -q short -R"rusage[tmp=20]" sleep 1000
bjobs shows 1 job in RUN state in each queue:
bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
16      user1   RUN   normal     hosta           hosta sleep 1000  Aug 30 16:26
17      user1   PEND  normal     hosta                 sleep 1000  Aug 30 16:26
18      user1   PEND  normal     hosta                 sleep 1000  Aug 30 16:26
19      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
20      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
21      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
blimits shows the TMP limit reached:
blimits
INTERNAL RESOURCE LIMITS:
NAME      USERS    QUEUES     SLOTS      TMP     JOBS
NONAME000   -      normal       -      20/20      1/2
NONAME001   -      short        -      20/20      1/2
Switch the running job in the normal queue to the short queue:
bswitch short 16
The bjobs command shows 2 jobs running in the short queue, and the second job running in the normal queue:
bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
17      user1   RUN   normal     hosta           hosta sleep 1000  Aug 30 16:26
18      user1   PEND  normal     hosta                 sleep 1000  Aug 30 16:26
19      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
16      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
20      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
21      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
The blimits command shows the TMP limit exceeded and the JOBS limit reached in the short queue:
blimits
INTERNAL RESOURCE LIMITS:
NAME    USERS    QUEUES     SLOTS      TMP     JOBS
NONAME000   -    normal       -      20/20      1/2
NONAME001   -    short        -      40/20      2/2
Switch the running job in the normal queue to the short queue:
bswitch short 17
The bjobs command shows 3 jobs running in the short queue and the third job running in the normal queue:
bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
18      user1   RUN   normal     hosta           hosta sleep 1000  Aug 30 16:26
19      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
16      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
17      user1   RUN   short      hosta           hosta sleep 1000  Aug 30 16:26
20      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
21      user1   PEND  short      hosta                 sleep 1000  Aug 30 16:26
The blimits command shows both TMP and JOBS limits exceeded in the short queue:
blimits
INTERNAL RESOURCE LIMITS:
NAME    USERS    QUEUES     SLOTS      TMP     JOBS
NONAME000   -    normal       -      20/20      1/2
NONAME001   -    short        -      60/20      3/2

Limits for resource consumers

Resource allocaton limits are applied according to the kind of resource consumer (host groups, compute units, users, user groups

Host groups and compute units

If a limit is specified for a host group or compute unit, the total amount of a resource used by all hosts in that group or unit is counted. If a host is a member of more than one group, each job running on that host is counted against the limit for all groups to which the host belongs. Per-user limits are enforced on each user or individually to each user in the user group listed. If a user group contains a subgroup, the limit also applies to each member in the subgroup recursively.

Limits for users and user groups

Jobs are normally queued on a first-come, first-served (FCFS) basis. It is possible for some users to abuse the system by submitting a large number of jobs; jobs from other users must wait until these jobs complete. Limiting resources by user prevents users from monopolizing all the resources.

Users can submit an unlimited number of jobs, but if they have reached their limit for any resource, the rest of their jobs stay pending, until some of their running jobs finish or resources become available.

If a limit is specified for a user group, the total amount of a resource used by all users in that group is counted. If a user is a member of more than one group, each of that user’s jobs is counted against the limit for all groups to which that user belongs.

Use the keyword all to configure limits that apply to each user or user group in a cluster. This is useful if you have a large cluster but only want to exclude a few users from the limit definition.

You can use ENFORCE_ONE_UG_LIMITS=Y combined with bsub -G to have better control over limits when user groups have overlapping members. When set to Y, only the specified user group’s limits (or those of any parent user group) are enforced. If set to N, the most restrictive job limits of any overlapping user/user group are enforced.

Per-user limits on users and groups

Per-user limits that use the keywords all apply to each user in a cluster. If user groups are configured, the limit applies to each member of the user group, not the group as a whole.

Resizable jobs

When a resize allocation request is scheduled for a resizable job, all resource allocation limits (job and slot) are enforced.

Once the new allocation is satisfied, it consumes limits such as SLOTS, MEM, SWAP and TMP for queues, users, projects, hosts, or cluster-wide. However, the new allocation will not consume job limits such as job group limits, job array limits, and non-host level JOBS limit.

Releasing part of an allocation from a resizable job frees general limits that belong to the allocation, but not the actual job limits.