Monitor scheduler efficiency and overhead

Use the bacct or badmin perfmon view commands to monitor scheduler efficiency.

When the amount of time that LSF spent scheduling a job is large compared to the run times of jobs, you will observe a low resource utilization in your cluster. For instance, if the average run time of jobs equals the average time required to fill a slot after a job finishes, the slot usage in the cluster will be approximately 50% of what it would be if scheduling overhead is zero. It is not always clear whether low utilization is caused by scheduling performance or by configured policies (such as limits) that block jobs from accessing resources.

LSF has a scheduling efficiency metric in the badmin perfmon command that estimates how the slot and memory utilization of the cluster is affected by scheduling overhead. A value near 100% means that improving scheduler performance does not significantly improve resource utilization, while a lower percentage indicates how improving scheduler performance will improve resource utilization. For example, a value of 75% means that due to scheduling overhead, resource utilization is only 75% of what it could be if scheduling overhead were to be reduced to zero.

Run the badmin perfmon view command to view the scheduler efficiency for finished jobs within a sample period. This displays the scheduler efficiency numbers for both the set of finished jobs within the sample period and all finished jobs in the cluster.

Run the bacct command to view the scheduler efficiency for all finished jobs in a cluster.