Configuring job-level automatic re-queuing
Procedure
Use spaces to separate multiple exit codes. The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list.
Job-level re-queue exit values override application-level and queue-level configuration of the parameter REQUEUE_EXIT_VALUES, if defined.
Jobs running with the specified exit code share the same application and queue with other jobs.
bsub -Q "all ~1 ~2 EXCLUDE(9)" myjob
Jobs exited with all exit codes except 1 and 2 are re-queued. Jobs with exit code 9 are re-queued so that the failed job is not rerun on the same host (exclusive job re-queue).
Enabling exclusive job re-queuing
Procedure
Exclusive job re-queue does not work for parallel jobs.
Modifying re-queue exit values
Procedure
bmod -Q does not affect running jobs. For re-runnable and re-queue jobs, bmod -Q affects the next run.
- Multicluster job forwarding model
-
For jobs sent to a remote cluster, arguments of bsub -Q take effect on remote clusters.
- Multicluster lease model
-
The arguments of bsub -Q apply to jobs running on remote leased hosts as if they are running on local hosts.