A job with an extendable run limit is allowed to continue running unless the resources
that are occupied by the job are needed by another job in a queue with the same or higher
priority.
Before you begin
Ensure that ALLOCATION_PLANNER=Y is defined in the
lsb.params file to enable the allocation planner.
About this task
You can configure the LSF
allocation planner to extend the run limits of a job by changing its soft run limit. A soft run
limit can be extended, while a hard run limit cannot be extended. The allocation planner looks at
job plans to determine if there are any other jobs that require the current job's resources.
Procedure
-
Edit lsb.queues.
-
Specify the EXTENDABLE_RUNLIMIT parameter for the queue and specify the
base limit and other keywords for the run limit.
EXTENDABLE_RUNLIMIT=BASE[minutes]
INCREMENT[minutes] GRACE[minutes] REQUEUE[Y |
N]
This parameter uses the following keywords:
- BASE[minutes]
- The initial soft run limit that is imposed on jobs in the queue. Whenever the job reaches the
soft run limit, the allocation planner considers whether the resources that are held by the job are
needed by another job in the queue by looking at plans for the other jobs. If the resources are not
required, LSF
extends the soft run limit for the current job. Otherwise, LSF sets a
hard run limit.
Specify an integer value for the initial soft run limit.
- INCREMENT[minutes]
- If LSF
decides to extend the soft run limit for the job, this keyword specifies the amount of time that
LSF extends the soft run limit.
Specify an integer value for the soft run limit extension time. The
default value is the value of the BASE[] keyword.
- GRACE[minutes]
- If LSF
decides not to extend the soft run limit for the job, a hard run limit is set for this amount of
minutes from the time the decision is made.
The default value is 0 (the job
is terminated or requeued immediately).
- REQUEUE[Y | N]
- Specifies the action that LSF takes
when a job reaches its hard run limit. If set to N, LSF
terminates the job. If set to Y
LSF requeues the job.
The default value is N (LSF
terminates the job once the job reaches its hard run limit).
For example,
Begin Queue
QUEUE_NAME = queue_extendable
PRIORITY = 10
EXTENDABLE_RUNLIMIT = BASE[60] INCREMENT[30] GRACE[10]
End Queue
-
Reconfigure the cluster:
-
Runlsadmin reconfig.
-
Run badmin reconfig.
-
Run bqueues -l to display the extendable run limit settings.