Configuring resource preemption
Procedure
- Configure preemptive scheduling (PREEMPTION in lsb.queues).
- Configure the preemption resources (PREEMPTABLE_RESOURCES
in lsb.params).
Job slots are the default preemption resource. To define additional resources to use with preemptive scheduling, set PREEMPTABLE_RESOURCES in lsb.params, and specify the names of the custom resources as a space-separated list.
- Customize the preemption action.
Preemptive scheduling uses the SUSPEND and RESUME job control actions to suspend and resume preempted jobs. For resource preemption, it is critical that the preempted job releases the resource. You must modify LSF default job controls to make resource preemption work.
Suspend using a custom job control.
To modify the default suspend action, set JOB_CONTROLS in lsb.queues and use replace the SUSPEND job control with a script or a signal that your application can catch. Do this for all queues where there could be preemptable jobs using the preemption resources.
For example, if your application vendor tells you to use the SIGTSTP signal, set JOB_CONTROLS in lsb.queues and use SIGTSTP as the SUSPEND job control:JOB_CONTROLS = SUSPEND [SIGTSTP]
Kill jobs with brequeue.
To kill and requeue preempted jobs instead of suspending them, set JOB_CONTROLS in lsb.queues and use brequeue as the SUSPEND job control:JOB_CONTROLS = SUSPEND [brequeue $LSB_JOBID]
Do this for all queues where there could be preemptable jobs using the preemption resources. This kills a preempted job, and then requeues it so that it has a chance to run and finish successfully.
Kill jobs with TERMINATE_WHEN.
To kill preempted jobs instead of suspending them, set TERMINATE_WHEN in lsb.queues to PREEMPT. Do this for all queues where there could be preemptable jobs using the preemption resources.
If you do this, the preempted job does not get to run unless you resubmit it.
- Optional. Configure the preemption wait time.
To specify how long LSF waits for the ELIM to report that the resources are available, set PREEMPTION_WAIT_TIME in lsb.params and specify the number of seconds to wait. You cannot specify any less than the default time (300 seconds).
For example, to make LSF wait for 8 minutes, specifyPREEMPTION_WAIT_TIME=480