Tuning IBM Spectrum LSF multicluster: best practices
Learn about best practices for tuning your LSF multicluster environment.
Understand the job forwarding flow
Job forwarding between clusters is queue-to-queue. On the submission cluster side, queues use the queue named in the SNDJOBS_TO parameter (of the lsb.queues file) to specify queues in remote clusters to which jobs can be sent. For remote queues to accept jobs from a cluster, the queues must explicitly allow it with the RCVJOBS_FROM parameter.
Use the HOSTS parameter in submission queues to specify whether jobs are allowed to run in the submission cluster, or if they will only run in remote clusters.
In each scheduling session, LSF will first try to run jobs in the local cluster, if configured. If a job cannot be dispatched immediately in the local cluster, then LSF will consider forwarding the job to one of the remote clusters.
Once a job has been forwarded to a remote cluster, the scheduler in the remote cluster tries to schedule the job. if the job has not been dispatched after some time, it returns to the submission cluster and the process repeats.
Set the import backlog limit on remote queues
With receiving queues, the IMPT_JOBBKLG parameter (set in the lsb.queues file) limits the number forwarded jobs that can be pending in the queue. Compare this parameter usage with the IMPT_TASKBKLG parameter, which limits the pending job tasks in the queue.
The default value for IMPT_JOBBKLG is 50 jobs. This limit helps to ensure that jobs do not have long to wait in a remote cluster. It gives jobs a better chance to run on the submission cluster, and avoids having jobs time out on the remote cluster and be returned to the submission cluster. This limit also helps to balance the load between multiple execution clusters.
For a high throughput environment, you can increase the import backlog limit. Ideally, when there is workload on the submission side, then there should also consistently be a backlog of pending jobs on the execution side. If a queue on the execution side runs out of workload, then resources in the execution cluster can go unused.
Monitor queues in the submission and execution queues to ensure that whenever there are jobs pending on the submission queue, there are also pending jobs on the remote queue. If the remote queue is running out of jobs even when there are jobs pending on the submission queue, then consider increasing the IMPT_JOBBKLG value on the remote queue.
Set the job forwarding rate
By default, an LSF cluster will forward up to 50 jobs per scheduling session to remote clusters. You can control the job forwarding rate by configuring the LSB_MAX_FORWARD_PER_SESSION parameter in the lsb.params file.
To adjust the maximum number of jobs that will be forwarded per scheduling session, configure LSB_MAX_FORWARD_PER_SESSION=integer in the lsb.params file.
The purpose of this policy is to limit the load on the management daemons in the submission cluster. However, for high throughput environments, this is far too restrictive.
As with the import backlog limits (set in the IMPT_JOBBKLG parameter), if this job forwarding limit is too small, then the rate at which jobs are forwarded might not keep up with the rate at which they are dispatched in the execution clusters.
For very large environments, you can set the LSB_MAX_FORWARD_PER_SESSION value to several thousands. Make sure to monitor the scheduling cycle time in (visible from the badmin perfmon view command output), to ensure that the scheduler remains running at a reasonable rate. A scheduler cycle of more than 15 seconds is reasonable.
Tune the rescheduling timeout value
When a job is forwarded to a remote cluster, it remains pending in the remote cluster until either it is dispatched, or it reaches the pending timeout value, controlled by the MAX_RSCHED_TIME parameter in the lsb.queues file in the submission cluster.
Upon reaching the time limit, the job returns to the submission cluster, so that it can run in the submission cluster or be forwarded to another cluster.
Ideally, most forwarded jobs should run without being returned. You can check if a job was forwarded and returned by running the bhist –l command. When a job returns, it will shows a status event, such as Pending: Remote schedule time reached, waiting for rescheduling.
If a large portion of jobs are returned, consider raising the MAX_RSCHED_TIME timeout value.
Preserve job submission order
By default, jobs are ordered in remote clusters by forwarding time, rather than by original submission time. Every time that a job is forwarded, it starts at the bottom of the queue. For jobs that are difficult to place (for instance, because of special resource requirements), this prioritization policy can cause them to go back and forth between clusters multiple times.
To ensure that jobs are ordered according to their submission time instead of forwarding time, set MC_SORT_BY_SUBMIT_TIME=Y in the lsb.params file. This can give hard-to-place jobs a better chance to run when they are forwarded.
Delay job forwarding
When a job is submitted, LSF first tries to try to place the job in the submission cluster. If the job cannot be dispatched, then LSF immediately considers forwarding the job to a remote cluster.
In some cases, it can be preferable to keep trying to dispatch a job in the submission cluster for a period of time, rather than immediately forwarding the job. For example, in a cloud bursting scenario where the submission cluster is on-premises, and the remote cluster is in a public cloud, it can be cheaper to run jobs on-premises when possible.
You can set the MC_FORWARD_DELAY parameter in the lsb.queues to tell LSF to try scheduling a job, in seconds, on the submission side for the given number of seconds before forwarding the job to a remote cluster.
Configure LSF to consider special resources
By default, LSF does not consider host resources in remote clusters when making forwarding decisions. As a result, jobs that require special resources can end up forwarded to clusters where the resources do not exist.
If there are special host resources that are available in only some clusters, then configure LSF to consider these resources when making forwarding decisions. Set the MC_RESOURCE_MATCHING_CRITERIA parameter in the lsb.params file on the remote side to MC_RESOURCE_MATCHING_CRITERIA=rc1 rc2 ..., where rc1 and rc2 are LSF numeric and string resources.
Also, consider to group hosts with special resources into a single cluster so that jobs that require these resources will have the best chance of dispatching once forwarded.
Consider fair share on job forwarding
By default, LSF does not consider forwarded jobs in a dynamic priority calculation. Rather, it only considers the resource usage by jobs that run in the submission cluster. As a result, when LSF considers job forwarding, the job may all come from the same user with highest fair share priority, rather than reflecting the sharing ratios specified in the fair share policy.
For LSF to consider forwarded jobs in fair share dynamic priority, configure the FWD_JOB_FACTOR parameter in the lsb.params (or lsb.queues) file. As a starting point, configure the FWD_JOB_FACTOR value to be equal to the RUN_JOB_FACTOR value.
Caution when configuring scheduling limits on remote clusters
It is useful in many cases to configure scheduling policies on remote clusters, such as limits, guarantee policies, and so on. However, beware that jobs are forwarded without consideration for these policies. For example, suppose that you have configured a per-user limit of ten jobs on a remote cluster. However, jobs in excess of this limit can be forwarded to the remote cluster. When this happens, the jobs pend as expected. However, these pending jobs also count against the import job backlog for the remote queue, which can block forwarding jobs from other users.
In general, use caution when applying scheduling policies on the execution side that can affect only some jobs in a submission queue.
Run short jobs in submission cluster
If jobs are, on average, short relative to the overhead of scheduling them, this can result in low cluster utilization. For LSF, short jobs are those which might run for less than ten minutes.
To handle short jobs in LSF, you can set RELAX_JOB_DISPATCH_ORDER=Y parameter in the lsb.params file. This configuration allows short jobs with identical resource requirements to run consecutively on a single allocation made by the scheduler, drastically reducing scheduling overhead for these jobs. Note that LSF only supports this policy on submission clusters; jobs that are forwarded to remote clusters cannot leverage this configuration.
In addition, there is the potential for incurring additional scheduling overhead in forwarding jobs to remote clusters and ensuring that they are dispatched there.
Overall, as a best practice, keep jobs on the submission clusters short and allow longer jobs to be forwarded to remote clusters.