Configuration to modify job migration
You can configure LSF to requeue a migrating job rather than restart or rerun the job.
Configuration file |
Parameter and syntax |
Behavior |
---|---|---|
lsf.conf |
LSB_MIG2PEND=1 |
|
LSB_REQUEUE_TO_BOTTOM=1 |
|
Checkpointing re-sizable jobs
After a checkpoint-able re-sizable job restarts (brestart), LSF restores the original job allocation request. LSF also restores job-level autoresizable attribute and notification command if they are specified at job submission.
Example
Begin Queue
...
QUEUE_NAME=checkpoint
CHKPNT=mydir 240
DESCRIPTION=Automatically checkpoints jobs every 4 hours to mydir
...
End Queue
If the command bchkpnt -k 123 is used to checkpoint and kill job 123, you can restart the job using the brestart command as shown in the following example:
Job <456> is submitted to queue <priority>
LSF assigns a new job ID of 456, submits the job to the queue named "priority," and restarts the job.
Once job 456 is running, you can change the checkpoint period using the bchkpnt command:
Job <456> is being checkpointed