Summarizing key functional and performance updates in the latest releases of the IBM Spectrum LSF offering on IBM Cloud.
The IBM Spectrum LSF on IBM Cloud offering allows customers to easily deploy a cluster of compute nodes where they can run their High-Performance Computing (HPC) workloads on IBM Cloud by using the IBM Spectrum LSF scheduling software. Our offering was initially released in 2021. In this blog, we provide a summary of additional features that we have added to the offering in recent releases.
Intel oneAPI HPC Toolkit
The Message Passing Interface (MPI) is provided as a software library that is used for communication between different processes that are located either on the same virtual machine or on different virtual machines. MPI has been widely adapted by the HPC community, and it plays a key role in achieving scalable performance across many nodes or VMs in a cluster. HPC users can choose from several MPI implementations, including Intel MPI and open-source versions, such as OpenMPI and MPICH.
In a recent release of the Spectrum LSF offering, we added support for use of the Intel MPI library for Intel oneAPI as an alternative to the use of OpenMPI. And, in order to evaluate performance of the library, we chose SNAP as the communication-intensive HPC application to use for this purpose. SNAP is commonly used to evaluate commodity HPC clusters by the U.S. Department of Energy Labs: Livermore, Los Alamos, and Sandia. We used the version of SNAP that can be found here.
A technical report that includes details of the LSF cluster environment on IBM Cloud, how the SNAP benchmark was executed and observations and analysis of the results are in this white paper here. However, we share the following summary of the results that demonstrate good scalability of the benchmark in an LSF cluster consisting of up to 63 compute nodes.
SNAP output includes multiple metrics, but the primary metric is the Figure of Merit (FOM), which is based on the solve time for a particular problem and the number of iterations and the total number of unknowns (parts of the problem to solve). The FOM is a direct indicator of the performance of the system. If you solve the same problem in half the time, the FOM increases by 2x, and if you solve a problem with 2x more unknowns in the same time per iteration, the FOM increases by 2x. We studied weak-scaling, where the size of the global domain increases linearly with the number of MPI ranks. For ideal scaling, the solve time per iteration would remain constant and the FOM would increase linearly with the number of MPI ranks.
A summary of the performance measurements is provided in Table 1, which shows the solve time and the Figure of Merit for SNAP, scaling from 8 cores (1 compute node) to 504 cores (63 compute nodes) on IBM Cloud. When using a single compute node, all communication is through shared memory, which is very efficient. As one scales out to an increasing number of compute nodes, more of the communication is over the Ethernet interface using TCP, and a larger fraction of the solve time is spent on MPI communication, resulting in somewhat lower performance per node. However, the aggregate performance, as indicated by the Figure of Merit, continues to show significant improvement over the full range of compute nodes available in our LSF cluster and close to linear scaling, as can be seen in Figure 1.
LSF Application Center
IBM Spectrum LSF Application Center provides a flexible and easy-to-use interface for cluster users and administrators. It enables users to interact with intuitive, self-documenting interfaces, and it is now included as an option to use as part of our LSF offering.
The LSF Application Center web-based UI provides the ability to easily do the following:
- Create and manage cluster users and access permissions.
- Select the types of notifications and alerts to receive about jobs.
- Submit, monitor and control jobs.
- Monitor usage of compute nodes in the cluster.
Screenshots of the LSF Application Center login screen as well as views of some of the capabilities mentioned above are included here:
For more detailed information on LSF Application Center details and its usage, see the official documentation here.
Custom image creation
The IBM Spectrum LSF offering includes two default custom images that are used to provision the VSIs for the clusters:
- LSF worker and management nodes
- Storage nodes (in the case that Spectrum Scale storage is selected for use)
However, users can supply their own custom images that may include, for instance, additional software required by their HPC applications.
In a recent release, we have added scripts and documentation that make it simple for users to create their own custom images. The scripts make use of the popular Packer, which is an automated virtual machine image creation tool.
Summary of new Spectrum LSF on IBM Cloud features
Since the initial release of Spectrum LSF on IBM Cloud, we have continued to improve its usability with the introduction of new functional and performance related features. In this blog post, we described a few of those features which have been added in recent releases:
- Inclusion of the Intel oneAPI HPC toolkit for use by applications running on the cluster worker nodes
- An option to deploy LSF Application Center within the cluster and provide an easy-to-use interface for cluster user administration and job submission and monitoring
- Scripts and documentation that simplify the process for custom image creation
In order to evaluate if your HPC applications may benefit from use of the offering, see the IBM Spectrum LSF on IBM Cloud documentation.
Follow IBM Cloud
Be the first to hear about news, product updates, and innovation from IBM Cloud.Email subscribeRSS