The IBM Spectrum LSF on IBM Cloud offering allows customers to easily deploy a cluster of compute nodes where they can run their High-Performance Computing (HPC) workloads on IBM Cloud by using the IBM Spectrum LSF scheduling software. Our offering was initially released in 2021. In this blog, we provide a summary of additional features that we have added to the offering in recent releases.
The Message Passing Interface (MPI) is provided as a software library that is used for communication between different processes that are located either on the same virtual machine or on different virtual machines. MPI has been widely adapted by the HPC community, and it plays a key role in achieving scalable performance across many nodes or VMs in a cluster. HPC users can choose from several MPI implementations, including Intel MPI and open-source versions, such as OpenMPI and MPICH.
In a recent release of the Spectrum LSF offering, we added support for use of the Intel MPI library for Intel oneAPI as an alternative to the use of OpenMPI. And, in order to evaluate performance of the library, we chose SNAP as the communication-intensive HPC application to use for this purpose. SNAP is commonly used to evaluate commodity HPC clusters by the U.S. Department of Energy Labs: Livermore, Los Alamos, and Sandia. We used the version of SNAP that can be found here.
A technical report that includes details of the LSF cluster environment on IBM Cloud, how the SNAP benchmark was executed and observations and analysis of the results are in this white paper here. However, we share the following summary of the results that demonstrate good scalability of the benchmark in an LSF cluster consisting of up to 63 compute nodes.
SNAP output includes multiple metrics, but the primary metric is the Figure of Merit (FOM), which is based on the solve time for a particular problem and the number of iterations and the total number of unknowns (parts of the problem to solve). The FOM is a direct indicator of the performance of the system. If you solve the same problem in half the time, the FOM increases by 2x, and if you solve a problem with 2x more unknowns in the same time per iteration, the FOM increases by 2x. We studied weak-scaling, where the size of the global domain increases linearly with the number of MPI ranks. For ideal scaling, the solve time per iteration would remain constant and the FOM would increase linearly with the number of MPI ranks.
A summary of the performance measurements is provided in Table 1, which shows the solve time and the Figure of Merit for SNAP, scaling from 8 cores (1 compute node) to 504 cores (63 compute nodes) on IBM Cloud. When using a single compute node, all communication is through shared memory, which is very efficient. As one scales out to an increasing number of compute nodes, more of the communication is over the Ethernet interface using TCP, and a larger fraction of the solve time is spent on MPI communication, resulting in somewhat lower performance per node. However, the aggregate performance, as indicated by the Figure of Merit, continues to show significant improvement over the full range of compute nodes available in our LSF cluster and close to linear scaling, as can be seen in Figure 1.
IBM Spectrum LSF Application Center provides a flexible and easy-to-use interface for cluster users and administrators. It enables users to interact with intuitive, self-documenting interfaces, and it is now included as an option to use as part of our LSF offering.
The LSF Application Center web-based UI provides the ability to easily do the following:
Screenshots of the LSF Application Center login screen as well as views of some of the capabilities mentioned above are included here:
For more detailed information on LSF Application Center details and its usage, see the official documentation here.
The IBM Spectrum LSF offering includes two default custom images that are used to provision the VSIs for the clusters:
However, users can supply their own custom images that may include, for instance, additional software required by their HPC applications.
In a recent release, we have added scripts and documentation that make it simple for users to create their own custom images. The scripts make use of the popular Packer, which is an automated virtual machine image creation tool.
Since the initial release of Spectrum LSF on IBM Cloud, we have continued to improve its usability with the introduction of new functional and performance related features. In this blog post, we described a few of those features which have been added in recent releases:
In order to evaluate if your HPC applications may benefit from use of the offering, see the IBM Spectrum LSF on IBM Cloud documentation.