Share this post:
This week we celebrate the release of our second agile update to IBM Spectrum LSF 10. And it’s our silver anniversary… 25 years of IBM Spectrum LSF!
As we previously discussed in this blog post, IBM Spectrum LSF 10 focused primarily on architectural changes to support future growth in performance and scalability. We’ve seen steady adoption of LSF 10, and client feedback has been very positive, with peak submission rates trebling and peak dispatch more than doubling with LSF 10 when compared to LSF 9.1.3.
Advancing IBM Spectrum LSF 10
In this update, we’re primarily focused on usability.
When we added the “-o” option in LSF 9 to customize the output of bjobs, clients loved it — and it generated a whole slew of requests for additional fields and additional support with other commands. So in this update we’ve added numerous additional fields and added “-o” to most of the query commands — we didn’t quite get them all completed with this update, so expect some more in the next update! We also added a -json option for this who like their output in JSON format, making it easy to be consumed by something else.
“Is my job running? Is my job running? Is my job running?” Folks often have scripts that constantly loop over the bjobs output waiting for specific job/job states before doing something else. Apart from this putting unnecessary load on the network, it’s just burning CPU cycles on the client side. What’s more, these looping scripts are often easily broken by unexpected messages back from the scheduler — so not very environmentally friendly or robust. To help make things simpler, we’ve introduced a new “bwait” command that effectively makes the “Is my job running” loop redundant. The user can issue a bwait with a list of jobs and dependencies and the script will block until it receives a call back from the scheduler telling it when the condition is true.
To make life easier for the administrator, we’ve also added some cognitive capabilities to autotune the scheduler. We’ve added a couple of example scripts for how to run TensorFlow and Caffe as Spectrum LSF jobs – though you would think these cognitive frameworks can figure that out for themselves!
Finally, we’ve added formal support for NVIDIA DCGM on both IBM POWER and x86. This provides a better experience with GPU workloads with GPU health monitoring, integrated GPU accounting and intelligent topology-aware job placement of applications onto healthy GPUs.
So what’s next? The third update is due in July and will contain, amongst other things, additional GPGPU support and further enhancements to support containerized workloads.
We’re only 25 years young and just getting started. Are you looking to learn more about the complete HPC workload management capabilities of IBM Spectrum LSF? Give it a try! Learn more about IBM Spectrum LSF evaluations here.