LSF features that require modification to work with EGO-enabled SLA scheduling (Obsolete)
LSF with EGO-enabled SLA scheduling is no longer supported and is obsolete.
The following LSF features require modification to work properly with EGO-enabled SLA scheduling (that is, when ENABLE_DEFAULT_EGO_SLA=Y is defined in the lsb.params file).
Parallel jobs
LSF dynamically gets job sizes (number of tasks) from EGO based on either the velocity or the total number of pending and running jobs in a service class, whichever is larger. Therefore, if the number of pending and running jobs in a service class is small, LSF requests only the velocity as configured in the service class. However, if the velocity is smaller than the number of tasks that are required by a parallel job (as requested by using the bsub -n option), the job pends indefinitely.
To prevent the parallel job from pending indefinitely, set a velocity goal to a higher value than the job size required by the parallel job so that any parallel jobs in the service class are scheduled instead of pending indefinitely. For more information about setting velocity goals, see Service classes for SLA scheduling.
Job size reservation for a parallel job
Configure job size (number of tasks) reservation in a queue by defining SLOT_RESERVE=MAX_RESERVE_TIME[integer] in lsb.queues. LSF reserves the job size for a large parallel job without being starved by other jobs that require a smaller job size than the large parallel job.
For example, if the service class for parallel jobs is LSF_Parallel, and the queue with job size reservation configured for parallel jobs Parallel_Reserve,
bsub -sla LSF_Parallel -J "array1[1-10]" myjob
bsub -sla LSF_Parallel -q Parallel_Reserve -n 4 myjob
bsub -sla LSF_Parallel -J "array2[1-10]" myjob
Resource requirements
A job level resource requirement (specified by using bsub -R) is not passed from LSF to EGO when you request job sizes. Resource requirements are only passed from LSF at the LSF service class or EGO consumer level.
Ensure all jobs that are submitted to the LSF service class can run on the job slots or hosts that are allocated by EGO according to the resource requirement in the service class or the corresponding EGO consumer.
Resource preemption
Use EGO resource reclaim between consumers according to the resource sharing plans for the resource preemption between jobs. When a slot is reclaimed by EGO according to the resource sharing plan, the job that is running on the slot can be killed or requeued in LSF so that the job slot can be used by other high priority workload.
LSF parallel job consumers
Do not configure a consumer of large LSF parallel jobs to borrow slots from other EGO consumers because a job that is running on a job slot are killed and if the job slot is reclaimed by EGO.
Configure the LSF parallel job consumer to own job slots, then lend the slots to other consumers that have small impact if their workload is preempted.