This post provides an example of how the IBM Spectrum LSF offering on IBM Cloud can be used for hosting weather research and forecasting HPC workloads.
High performance computing (HPC) workloads can use the IBM Spectrum LSF scheduling software on a cluster of compute nodes, which they can easily deploy on IBM Cloud using the IBM Spectrum LSF offering. In this blog post, we provide a summary of the results from an evaluation of a set of benchmark runs on IBM Cloud using the Weather Research and Forecasting (WRF) Model workload. For more detailed technical information about the evaluation, see the related white paper here.
The WRF Model
The WRF Model is a mesoscale numerical weather prediction system. The model is widely used for meteorological applications. The model’s scale of resolution can range from tens of meters to hundreds of kilometers. For this evaluation, we chose the Continental United States (CONUS) at 2.5km lateral resolution. This is representative of the current state-of-the-art for deterministic forecast models, making it an interesting test case for cloud computing.
The Spectrum LSF offering on IBM Cloud was used to create all of the necessary resources and to configure the HPC cluster for evaluating the WRF workload. Spectrum LSF makes use of virtual private cloud (VPC) infrastructure services, including IBM Cloud Schematics and Terraform capabilities. The basic elements of the cluster are illustrated in Figure 1. There is a jump host (login system), one or more LSF master nodes, one NFS server node for storage and a dynamically variable number of LSF worker nodes. Once the base cluster was created, the WRF model software and its dependencies were installed, configured and compiled on the LSF master system.
The Spectrum LSF auto-scaling Resource Connector (RC) feature was used to dynamically provision worker nodes for the WRF model calculations and then de-provision them once the workload runs were completed and the nodes were idle:
WRF benchmark run results
For weather forecast models, the simulation speed-up provides a useful figure of merit. This is the ratio of the forward integration time in forecast-hours to the actual elapsed time required to complete the job. A simulation speed-up factor of ~24x or greater is desirable because that would allow hourly updates for the next day’s weather forecast, and a speed-up factor of ~48x — which provides a two-day forecast — is excellent.
WRF was used to carry out a 12-hour forecast for the continental US at 2.5km lateral resolution. The simulation speed-up factor was calculated by taking the 15 second time-step and dividing it by the average elapsed time per time-step for the 12-hour forecast. The speed-up factor indicates how many forecast hours can be computed in one wall-clock hour.
The speed-up scaling curve is shown in the following graph:
Within one wall-clock hour, a 24-hour forecast can be completed using 100 nodes (1600 vCPUs), a 36-hour forecast can be completed using 135 nodes (2160 vCPUs) and a 48-hour forecast can be completed using 198 nodes (3168 vCPUs).
Starting from a cluster with just the LSF master node, with no statically allocated worker nodes, it typically took two to three minutes to allocate and provision the worker nodes required for each job, spanning a range of 33 to 198 worker nodes.
Summary of the WRF workload evaluation
The IBM Cloud environment deployed using the Spectrum LSF offering provided good performance and scalability for the WRF weather model covering the continental US at 2.5 km lateral resolution. A 48-hour forecast can be completed in one elapsed hour using a cluster consisting of 198 cx2-16x32 IBM Cloud virtual server instances (VSIs). All 198 VSIs were provisioned and configured in the LSF cluster in under three minutes. This demonstrates that IBM Cloud has the performance and scalability for high resolution production weather forecasting.
By leveraging the auto-scaling capabilities of the Spectrum LSF offering and the fast provisioning performance of IBM Cloud VSIs, HPC cluster setup is simple and efficient, and operational costs can be minimized by paying only for compute resources when they are needed.
Get started with IBM Spectrum LSF on IBM Cloud.