Pinned topic LoadLeveler preemption and class draining.

‏2007-04-25T20:02:23Z |
When the llctl drain startd someclass command is run, we expect LL to stop scheduling jobs in class someclass and let jobs already running in someclass terminate normally.

If a job in someclass is preempted when the class receives the drain command, the job will not resume until the class is resumed. This behavior makes it difficult to take a part of a cluster down for maintenance without loosing jobs.

Have other sites encountered this problem? Is there a clean way to deal with this? Checkpointing is not an option: it's too unreliable and may even not work at all on a preempted job...