Configuring extendable run limits

A job with an extendable run limit is allowed to continue running unless the resources that are occupied by the job are needed by another job in a queue with the same or higher priority.

Before you begin

Ensure that ALLOCATION_PLANNER=Y is defined in the lsb.params file to enable the allocation planner.

About this task

You can configure the LSF allocation planner to extend the run limits of a job by changing its soft run limit. A soft run limit can be extended, while a hard run limit cannot be extended. The allocation planner looks at job plans to determine if there are any other jobs that require the current job's resources.

Procedure

  1. Edit lsb.queues.
  2. Specify the EXTENDABLE_RUNLIMIT parameter for the queue and specify the base limit and other keywords for the run limit.

    EXTENDABLE_RUNLIMIT=BASE[minutes] INCREMENT[minutes] GRACE[minutes] REQUEUE[Y | N]

    This parameter uses the following keywords:

    BASE[minutes]
    The initial soft run limit that is imposed on jobs in the queue. Whenever the job reaches the soft run limit, the allocation planner considers whether the resources that are held by the job are needed by another job in the queue by looking at plans for the other jobs. If the resources are not required, LSF extends the soft run limit for the current job. Otherwise, LSF sets a hard run limit.

    Specify an integer value for the initial soft run limit.

    INCREMENT[minutes]
    If LSF decides to extend the soft run limit for the job, this keyword specifies the amount of time that LSF extends the soft run limit.

    Specify an integer value for the soft run limit extension time. The default value is the value of the BASE[] keyword.

    GRACE[minutes]
    If LSF decides not to extend the soft run limit for the job, a hard run limit is set for this amount of minutes from the time the decision is made.

    The default value is 0 (the job is terminated or requeued immediately).

    REQUEUE[Y | N]
    Specifies the action that LSF takes when a job reaches its hard run limit. If set to N, LSF terminates the job. If set to Y LSF requeues the job.

    The default value is N (LSF terminates the job once the job reaches its hard run limit).

    For example,

    Begin Queue 
    QUEUE_NAME = queue_extendable 
    PRIORITY = 10 
    EXTENDABLE_RUNLIMIT = BASE[60] INCREMENT[30] GRACE[10] 
    End Queue
    
  3. Reconfigure the cluster:
    1. Runlsadmin reconfig.
    2. Run badmin reconfig.
  4. Run bqueues -l to display the extendable run limit settings.