Configuring an interruptible backfill queue
Procedure
Specify the minimum number of seconds for the job to be considered for backfilling. This minimal time slice depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created if a site has jobs of distinctively different classes.
Begin Queue
QUEUE_NAME = background
# REQUEUE_EXIT_VALUES (set to whatever needed)
DESCRIPTION = Interruptible Backfill queue
BACKFILL = Y
INTERRUPTIBLE_BACKFILL = 1
RUNLIMIT = 10
PRIORITY = 1
End Queue
Interruptible backfill is disabled if BACKFILL and RUNLIMIT are not configured in the queue.
The value of INTERRUPTIBLE_BACKFILL is the minimal time slice in seconds for a job to be considered for backfill. The value depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created for different classes of jobs.
BACKFILL and RUNLIMIT must be configured in the queue.
RUNLIMIT corresponds to a maximum time slice for backfill, and should be configured so that the wait period for the new jobs submitted to the queue is acceptable to users. 10 minutes of runtime is a common value.
- If jobs are checkpoint-able, use their checkpoint exit value.
- If jobs periodically save data on their own, use the SIGTERM exit value.