Performance enhancements
The following new features can improve performance.
Improved mbatchd performance and scalability
Job dependency evaluation is used to check whether each job's dependency condition is satisfied. You can improve the performance and scalability of the mbatchd daemon by limiting the amount of time that mbatchd takes to evaluate job dependencies in one scheduling cycle. This limits the amount of time that the job dependency evaluation blocks services and frees up time to perform other services during the scheduling cycle. Previously, you could only limit the maximum number of job dependencies, which only indirectly limited the amount of time spent evaluating job dependencies. Job dependency evaluation is a process that is used to check whether each job's dependency condition is satisfied.
Improve performance of LSF daemons by automatically configuring CPU binding
You can now enable LSF to automatically bind LSF daemons to CPU cores by enabling the LSF_INTELLIGENT_CPU_BIND parameter in the lsf.conf file. LSF automatically creates a CPU binding configuration file for each master and master candidate host according to the automatic binding policy.
Reduce mbatchd workload by allowing user scripts to wait for a specific job condition
The new bwait command pauses and waits for the specified job condition to occur before the command returns. End users can use this command to reduce workload on the mbatchd daemon by including bwait in a user script for running jobs instead of using the bjobs command in a tight loop to check the job status. For example, the user script might have a command to submit a job, then run bwait to wait for the first job to be DONE before continuing the script.
The new lsb_wait() API provides the same functionality as the bwait command.
Changes to default LSF behavior
Parallel restart of the mbatchd daemon
The mbatchd daemon now restarts in parallel by default. This means that there is always an mbatchd daemon handling client commands during the restart to help minimize downtime for LSF. LSF starts a new or child mbatchd daemon process to read the configuration files and replace the event file. Previously, the mbatchd daemon restarted in serial by default and required the use of the badmin mbdrestart -p command option to restart in parallel. To explicitly enable the mbatchd daemon to restart in serial, use the new badmin mbdrestart -s command option.
New default value for caching a failed DNS lookup
The default value of the LSF_HOST_CACHE_NTTL parameter in the lsf.conf file is increased to the maximum valid value of 60 seconds (from 20 seconds). This reduces the amount of time that LSF takes to repeat failed DNS lookup attempts.
Multithread mbatchd job query daemon
- The LSB_QUERY_PORT parameter in the lsf.conf file is set to 6891, which enables the multithread mbatchd job query daemon and specifies the port number that the mbatchd daemon uses for LSF query requests.
- The LSB_QUERY_ENH parameter in the lsf.conf file is set to Y, which extends multithreaded query support to batch query requests (in addition to bjobs query requests).