Configuration to modify job migration

You can configure LSF to requeue a migrating job rather than restart or rerun the job.


Configuration file

Parameter and syntax

Behavior

lsf.conf

LSB_MIG2PEND=1

  • LSF re-queues a migrating job rather than restarting or rerunning the job

  • LSF re-queues the job as pending in order of the original submission time and priority

  • In a multicluster environment, LSF ignores this parameter

LSB_REQUEUE_TO_BOTTOM=1

  • When LSB_MIG2PEND=1, LSF re-queues a migrating job to the bottom of the queue, regardless of the original submission time and priority

  • If the queue defines APS scheduling, migrated jobs keep their APS information and compete with other pending jobs based on the APS value


Checkpointing re-sizable jobs

After a checkpoint-able re-sizable job restarts (brestart), LSF restores the original job allocation request. LSF also restores job-level autoresizable attribute and notification command if they are specified at job submission.

Example

The following example shows a queue configured for periodic checkpointing in lsb.queues:
Begin Queue
...
QUEUE_NAME=checkpoint
CHKPNT=mydir 240
DESCRIPTION=Automatically checkpoints jobs every 4 hours to mydir
...
End Queue
Note: The bqueues command displays the checkpoint period in seconds; the lsb.queues CHKPNT parameter defines the checkpoint period in minutes.

If the command bchkpnt -k 123 is used to checkpoint and kill job 123, you can restart the job using the brestart command as shown in the following example:

brestart -q priority mydir 123
Job <456> is submitted to queue <priority>

LSF assigns a new job ID of 456, submits the job to the queue named "priority," and restarts the job.

Once job 456 is running, you can change the checkpoint period using the bchkpnt command:

bchkpnt -p 360 456
Job <456> is being checkpointed