The Message Passing Interface (MPI) is provided as a software library that is used for communication between different processes that are located either on the same virtual machine or on different virtual machines. MPI has been widely adapted by the HPC community, and it plays a key role in achieving scalable performance across many nodes or VMs in a cluster. HPC users can choose from several MPI implementations, including Intel MPI and open-source versions, such as OpenMPI and MPICH.

In a recent release of the Spectrum LSF offering, we added support for use of the Intel MPI library for Intel oneAPI as an alternative to the use of OpenMPI. And, in order to evaluate performance of the library, we chose SNAP as the communication-intensive HPC application to use for this purpose. SNAP is commonly used to evaluate commodity HPC clusters by the U.S. Department of Energy Labs: Livermore, Los Alamos, and Sandia. We used the version of SNAP that can be found here.

A technical report that includes details of the LSF cluster environment on IBM Cloud, how the SNAP benchmark was executed and observations and analysis of the results are in this white paper here. However, we share the following summary of the results that demonstrate good scalability of the benchmark in an LSF cluster consisting of up to 63 compute nodes.

SNAP output includes multiple metrics, but the primary metric is the Figure of Merit (FOM), which is based on the solve time for a particular problem and the number of iterations and the total number of unknowns (parts of the problem to solve). The FOM is a direct indicator of the performance of the system. If you solve the same problem in half the time, the FOM increases by 2x, and if you solve a problem with 2x more unknowns in the same time per iteration, the FOM increases by 2x. We studied weak-scaling, where the size of the global domain increases linearly with the number of MPI ranks. For ideal scaling, the solve time per iteration would remain constant and the FOM would increase linearly with the number of MPI ranks.

A summary of the performance measurements is provided in Table 1, which shows the solve time and the Figure of Merit for SNAP, scaling from 8 cores (1 compute node) to 504 cores (63 compute nodes) on IBM Cloud. When using a single compute node, all communication is through shared memory, which is very efficient. As one scales out to an increasing number of compute nodes, more of the communication is over the Ethernet interface using TCP, and a larger fraction of the solve time is spent on MPI communication, resulting in somewhat lower performance per node. However, the aggregate performance, as indicated by the Figure of Merit, continues to show significant improvement over the full range of compute nodes available in our LSF cluster and close to linear scaling, as can be seen in Figure 1.