We are excited to announce the release of Slurm on IBM Cloud.
This is a solution to help you set up an end-to-end high performance computing (HPC) system by using automated scripts from the public git repository.
The Slurm Workload Manager software delivers powerful enterprise-class management for running compute-intensive and data-intensive distributed applications. The software is open-source, fault-tolerant and is a highly scalable cluster management and job scheduling offering. It accelerates dozens of parallel applications for faster results and better utilization of all available resources. With Slurm Workload Manager, you can improve IT performance, reduce infrastructure costs and expenses and quickly meet business demands.
Some of the key capabilities that Slurm Workload Manager offers include the following:
- Allocate access to compute node resources for users to perform work.
- Provide the framework for starting, executing and monitoring work on a set of allocated nodes.
- Arbitrate contention for resources by managing a queue of pending work.
IBM delivers HPC value and experience
Fifty-five percent of the United States GDP of around $10 trillion is touched by high performance computing (HPC), including for industrial design, weather prediction, genomic research, vehicle crash simulation and drug discovery. Every industry — automotive, aerospace, electronics, financial sector, oil and gas, energy and utilities, life sciences and more — are running these compute-intensive workloads to optimize designs or predict business outcomes.
Other patterns that lend themselves well to HPC are serverless computing, analytics, big data, Hadoop and machine learning. At IBM, we have been using HPC on Cloud for semiconductor design and have scaled to 29,000 vCPUs with a 5X linear improvement. Understanding the nature of the workload (be it high throughput or parallel) is key, and IBM has been working with clients on HPC algorithm development and architecture design for the past 25 years to improve infrastructure utilization.
Considerations for HPC on the Cloud
A cloud vendor that provides an integrated solution out of the box — with compute instances, workload schedulers, storage management and high-speed data transfer — will be able to help solve your HPC problems. Buying these products à la carte from different vendors increases the risk of deployment and support considerably. Customers are looking for one-stop shopping, consolidated billing and a single point of support. The process should be fully automated by inputting the appropriate configuration parameters, resulting in automatic provisioning of clusters and installation of all required software. This is a huge differentiator over the current way of doing things, and the setup can be completed in hours — and not days — dramatically improving time to market.
You may also want to operate in hybrid mode, which means running static or steady-state jobs on-premises and dynamic or burst jobs on the cloud. Any offering must support this with full automation. The cloud offering should charge you only for the capacity you use so that it is a true utility-based model. It should also support worldwide multi-zone regions, the highest level of encryption and security, disaster recovery and high availability capabilities.
The cloud provides instantaneous capacity to satisfy HPC peak loads, eliminating the lengthy wait times, so you can perform multiple iterations of your simulations to achieve best possible results.
We encourage you to bring their applications to us, and we will guide you on the best approach for success.