In this post, we demonstrate how to run an HPC workload in an IBM Spectrum LSF cluster on IBM Cloud. 

We assume that you have an IBM Cloud account and that you have already created an IBM Spectrum LSF cluster through the IBM Cloud Catalog. If you haven’t, please follow the instructions in this previous blog post first to create a cluster.

Step 1: Log in to the master node of your cluster

Refer to the tutorial if you don’t know how to get the IP addresses needed for the ssh command:

ssh -J root@ip-jumphost lsfadmin@ip-masternode

 

Step 2: Navigate to the shared folder

After logging into the master node, you can see a folder named shared. This allows access to the shared NFS of the cluster. All files placed under this folder will be accessible to all the worker nodes. This is an ideal location for placing the application binaries and software libraries so that the jobs we submit can have the same environment for their work.

Let’s create a subfolder demo in the shared folder for everything that we need in this demo:

mkdir -p shared/demo
cd shared/demo

 

Step 3: Build and install LAMMPS

We choose LAMMPS as an example workload. As a classical molecular dynamics (MD) code with a focus on materials modeling, Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is widely used in many research fields, including materials science, biomedical engineering, chemical engineering, etc.

First, we set the environment variables to include the MPI executables and library shared objects:

export PATH=$PATH:/usr/local/openmpi-4.1.0/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi-4.1.0/lib

Then, we acquire the latest stable LAMMPS release:

cd ~/shared/demo
git clone -b stable_29Oct2020 https://github.com/lammps/lammps.git 

Install the needed packages:

cd lammps/src
make yes-molecule
make yes-rigid
make yes-manybody

Build LAMMPS:

sed '/CCFLAGS/s/$/ -std=c++11/' MAKE/Makefile.mpi > MAKE/Makefile.lsf
make -j 16 lsf

Run an example to make sure everything works:

cd ~/shared/demo/lammps
mpirun -np 2 src/lmp_lsf -in examples/flow/in.flow.pois

You should see no errors in the output.

Step 4: Build and install mpitrace

To collect detailed information about messaging, mpitrace contains a library of tools aiming at analysis of distributed-memory parallel applications written with MPI, which provide a convenient place to enable other performance tools. Here, we build and install it in the shared folder, too.

Make sure the environment variables are set for MPI:

export PATH=$PATH:/usr/local/openmpi-4.1.0/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi-4.1.0/lib
cd ~/shared/demo
git clone https://github.com/IBM/mpitrace.git
cd mpitrace/src
./configure
make

Now the dynamic library $HOME/shared/demo/mpitrace/src/libmpitrace.so is accessible for every worker node. When submitting jobs on the HPC cluster, we just need to export it as LD_PRELOAD variable before MPI applications to enable mpitrace profiling, for example:

export LD_PRELOAD=$HOME/shared/demo/mpitrace/src/libmpitrace.so
<MPI applications>
unset LD_PRELOAD

 

Step 5: Submit an LSF job to run LAMMPS simulation with mpitrace profiling

Now that we have set up the execution environment, it’s time to run a multi-node MPI job. We will use one of the LAMMPS benchmarks for simulating Cu metallic solid with embedded atom method (EAM) potential. There are 32,000 atoms in a box of 20^3 Angstroms, and we run NVE time integration for 100 steps with 5.0 fs as the time step size.

Before we prepare the benchmark and the job script, let’s examine the LSF cluster status by running bhosts. If you used the default setting and the minimum worker node number is 0, you should see something like the following. This indicates that there are two master nodes in the LSF cluster, and there are no available slots for LSF jobs. Thus, when you submit a job that require slots, LSF auto-scaling will provision worker nodes on the fly. Thus, there will be a small delay before your job executes:

[lsfadmin@icgen2host-10-241-128-37 eam]$ bhosts
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
icgen2host-10-241- closed          -      0      0      0      0      0      0
icgen2host-10-241- closed          -      0      0      0      0      0      0

First, download the benchmark and extract it:

cd ~/shared/demo
wget https://www.lammps.org/bench/bench_eam.tar.gz
tar zxvf bench_eam.tar.gz

Then, we create an LSF script for the job that will run this benchmark:

cd eam
cat << EOF > job.lsf
#!/bin/bash 
#BSUB -n 8
#BSUB -R span[ptile=2]
#BSUB -o out.%J
#BSUB -e err.%J
#BSUB -J cu-20

export PATH=$PATH:/usr/local/openmpi-4.1.0/bin:/home/lsfadmin/shared/demo/lammps/src
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi-4.1.0/lib
export LD_PRELOAD=/home/lsfadmin/shared/demo/mpitrace/src/libmpitrace.so

cd ~/shared/demo/eam/
mpirun --report-bindings --bind-to core lmp_lsf -in in.eam
unset LD_PRELOAD
EOF

We specify -n 8, meaning that there will be eight MPI ranks. And ptile=2 means that we want two ranks per worker node. Since the default bx2-4x16 profile grants two cores per worker node, this job requires four worker nodes to run and we can fully subscribe the worker nodes. In the script, we also include export commands to set up the environment variables. The LD_PRELOAD setting will enable mpitrace profiling.

Now that everything is there, we submit the job by running the following:

bsub < job.lsf

You can then run bjobs to check the status of the submitted job. As previously mentioned, there can be a delay before enough worker nodes are provisioned. When the workers are ready, bjobs should output something similar to the following:

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
4       lsfadmi RUN   normal     icgen2host- icgen2host- cu-20      Aug 23 14:30
                                             icgen2host-10-241-128-44
                                             icgen2host-10-241-128-45
                                             icgen2host-10-241-128-45
                                             icgen2host-10-241-128-46
                                             icgen2host-10-241-128-46
                                             icgen2host-10-241-128-43
                                             icgen2host-10-241-128-43

You can also see that there are more hosts from bhosts. Four extra worker nodes were automatically provisioned by LSF:

HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
icgen2host-10-241- closed          -      0      0      0      0      0      0
icgen2host-10-241- closed          -      0      0      0      0      0      0
icgen2host-10-241- ok              -      2      0      0      0      0      0
icgen2host-10-241- ok              -      2      0      0      0      0      0
icgen2host-10-241- ok              -      2      0      0      0      0      0
icgen2host-10-241- ok              -      2      0      0      0      0      0

When the job completes, the output from stdout will be in out.<jobid> and errors in err.<jobid>. Since we also enabled mpitrace, we will see some files with file name like mpi_profile*. By default, four files from rank 0, rank with minimum communication time, rank with maximum communication time, and rank with median communication time will be stored. You can use sftp to connect to the master node and download these files back to your local machine.

Conclusions

That’s it! You’ve completed the basics of running an HPC workload on an IBM Cloud HPC cluster. We went through the steps for installing HPC software, submitting a job to run LAMMPS simulation and collecting results from the job in an HPC cluster on IBM Cloud. Please give it a try and refer to IBM Cloud Docs for more detailed instructions. We also encourage you to bring your own HPC workloads to IBM Cloud through the convenient HPC service that IBM provides. Please share your valuable feedback with us.

References

Categories

More from Cloud

Kubernetes version 1.28 now available in IBM Cloud Kubernetes Service

2 min read - We are excited to announce the availability of Kubernetes version 1.28 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 23rd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.27 (soon to be 1.28); you can also choose to immediately deploy version 1.28. Learn more about deploying clusters here. Kubernetes version 1.28 In…

Temenos brings innovative payments capabilities to IBM Cloud to help banks transform

3 min read - The payments ecosystem is at an inflection point for transformation, and we believe now is the time for change. As banks look to modernize their payments journeys, Temenos Payments Hub has become the first dedicated payments solution to deliver innovative payments capabilities on the IBM Cloud for Financial Services®—an industry-specific platform designed to accelerate financial institutions' digital transformations with security at the forefront. This is the latest initiative in our long history together helping clients transform. With the Temenos Payments…

Foundational models at the edge

7 min read - Foundational models (FMs) are marking the beginning of a new era in machine learning (ML) and artificial intelligence (AI), which is leading to faster development of AI that can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications.  With the increasing importance of processing data where work is being performed, serving AI models at the enterprise edge enables near-real-time predictions, while abiding by data sovereignty and privacy requirements. By combining the IBM watsonx data…

The next wave of payments modernization: Minimizing complexity to elevate customer experience

3 min read - The payments ecosystem is at an inflection point for transformation, especially as we see the rise of disruptive digital entrants who are introducing new payment methods, such as cryptocurrency and central bank digital currencies (CDBC). With more choices for customers, capturing share of wallet is becoming more competitive for traditional banks. This is just one of many examples that show how the payments space has evolved. At the same time, we are increasingly seeing regulators more closely monitor the industry’s…