IBM Platform LSF 9.1: Placing jobs based on available job slots of hosts

LSF 9.1 introduced a new built-in host-based resource slots to support placement of jobs based on available job slots. This tip describes how to configure and use the slots resource with existing LSF resource requirements according to a packing or spreading policy:

  • Packing means always placing jobs on the hosts with the least available slots first. Packing jobs can make room for bigger parallel jobs.
  • Spreading tries to spread jobs out and places jobs on the hosts with the most available slots first. Spreading jobs maximizes the performance of individual jobs.

The slots keyword represents available slots on each host and it is a built-in numeric decreasing resource. When a job occupies some of the job slots on the host, the slots resource value is decreased accordingly. For example, if MXJ of an LSF host is defined as 8, the slots value will be 8 when the host is empty. When 6 LSF job slots have been occupied, slots becomes 2. The slots resource can only be used in select[] and order[] sections of a job resource requirement string. To apply a job packing or spreading policy, you can use the order[] section in the job resource requirement. For example, -R “order[-slots]” will order candidate hosts based on the least available slots, while –R “order[slots]” will order candidate hosts based on the hosts with the most available slots.

As part of resource requirement, the order[] section can be used in following contexts:

  • In a queue RES_REQ in lsb.queues
  • In a application Profile level RES_REQ in lsb.applications
  • In a job-level resource requirement, bsub –R or bmod –R.

A job-level order[] clause overrides an application-level section, which overrides a queue-level section.

During scheduling, by default, candidate hosts for jobs with the common resource requirement are selected and sorted only once in each scheduling cycle. This is designed to speed up scheduling performance in a high throughput computing environment where large number of pending jobs are in the system. However, when many jobs are dispatched in a single scheduling cycle, ordering candidate hosts once per cycle may not get accurate scheduling results. This is because host order does not change any more even though slots value changes due to new jobs being dispatched. Use an exclamation mark (!) in the order[] section to sort candidate hosts per job, so that changes in the slots value within a single scheduling cycle can be recognized.

To use ! in an order[] clause, you must set SCHED_PER_JOB_SORT=Y in lsb.params. To make the parameter take effect, run badmin mbdrestart or badmin reconfig on the master host to reconfigure mbatchd.

The following is an example of using the slots resource:

  1. Configure RES_REQ in a Queue section of lsb.queues.
Begin Queue
QUEUE_NAME = myqueue
RES_REQ    = order[-slots]

End Queue
  1. Configure RES_REQ in an application profile in lsb.applications.
Begin Application
NAME       = myapp
RES_REQ    = order[slots]
End Application
  1. Run badmin reconfig to make the configurations take effect for queue and application. bqueues and bapp show the result.
# bqueues -l myqueue
QUEUE: myqueue
RES_REQ:  order[-slots]

# bapp -l myapp
RES_REQ:  order[slots]

You can now use the new queue and application profile for your job submissions. For example:

  • bsub –q myqueue myjob

The job is restricted by the queue to use the host with the least available slots first.

  • bsub –app myappp myjob

The job is restricted by the application profile to use the host with the most available slots first.

  • bsub –R “order[!slots]” –J “array[1-3]” myjob

A job array with 3 elements asks for the host with the most available slots, ordered by job.