Innovations in AI and HPC with IBM Spectrum LSF and NVIDIA DGX

3 min read

Across many industries, organizations rely on high performance computing (HPC) to drive their core mission, delivering higher-quality, innovative products to market faster.

As we forge ahead in the era of artificial intelligence (AI), organizations are looking to leverage AI technologies to accelerate time to market and business decisions. AI is often seen as something independent from traditional HPC workloads. However, the reality is that AI methods are increasingly being applied in HPC, with AI-infused and AI-guided HPC being two such techniques.

Joining forces: HPC and AI

Whether for AI-infused or AI-guided HPC, data is the common denominator in the race to deliver higher-quality products to market faster. Organizations today have the ability to capture vast volumes of data from a variety of sources, including sensor and IoT data. Furthermore, organizations often have a wealth of data that has been acquired over decades from running traditional HPC simulation and modeling workloads. Using these sources of data, HPC and AI can be applied to the same problems to deepen insights and innovation.

Let’s consider some examples where AI-infused and AI-guided HPC are being used to tackle problems faster and with greater accuracy ever. AI-infused HPC involves applying AI methods to analyze the output of simulations. AI-guided HPC is the application of AI to reduce the problem space for HPC simulations. This is sometimes referred to as intelligent simulation, where Bayesian methods are applied.

AI-infused HPC in electronic design automation (EDA)

As part of modern semiconductor design, billions of verification tests must be run to validate chip designs. For example, the IBM POWER10 processor has 18 billion transistors. Typically, semiconductor-design companies must book time at a foundry for production far ahead of time. If an error is found during the validation phase, it’s not practical to re-run the entire set of billions of verification tests. Using AI-infused HPC methods can help identify the tests that need to be re-run, thereby saving a significant amount of compute cycles and helping to keep the manufacturing timelines on track.  

AI-guided HPC in automotive design

In the automotive industry, the design of vehicles and components often evolve from previous designs. During the design process of a new model, there are millions of potential changes and optimizations which can be considered to improve characteristics like aerodynamics, noise, vibration and harshness (NVH) and structural stiffness, just to name a few. However, assessing all these potential changes over different road conditions and parameters would significantly increase the cycle time between models. Automotive manufacturers have significant bodies of knowledge about existing designs, and they are exploring how to train AI models based upon these large bodies of data in order to rapidly determine the best areas for vehicle optimization. This approach helps to significantly reduce the problem space and allows manufacturers to focus traditional HPC methods on more targeted areas of the design. Ultimately, the goal is to produce a better-quality product, in a shorter amount of time.

The challenge of scale

As organizations scale up AI environments in support of their HPC practice, those environments closely resemble those used for traditional HPC workloads. High-speed interconnects, accelerators and high-performance parallel filesystems are de rigeur for both AI and HPC workloads. Due to the coupled nature of AI-infused and AI-guided HPC workloads, organizations are looking at ways to run these workloads on a common infrastructure to capitalize on economies of scale.

Furthermore, new classes of users are now appearing alongside traditional HPC users, including data scientists, engineers and researchers. Often, these users of modern HPC environments are domain experts and not IT or HPC infrastructure experts. Therefore, organizations are always looking for ways to make it easy for users to run their work and get results, while ensuring that the compute resources including GPUs are well utilized.

Managing a converged infrastructure for HPC and AI

IBM Spectrum LSF Suite is a high-performance, highly scalable workload and resource management solution for demanding HPC environments. LSF Suite helps to simplify the user experience and improve utilization in HPC environments for traditional HPC simulation and modeling, virtual engineering, digital twins and AI-infused and AI-guided HPC. LSF Suites supports workloads on-premises, in the cloud and hybrid cloud, as well as support for containerized, GPU, machine learning and deep learning workloads.

IBM Spectrum LSF Suite is a high-performance, highly scalable workload and resource management solution for demanding HPC environments.

Advanced support for NVIDIA GPUs in LSF Suite means that organizations can drive utilization by simplifying administration with powerful features, including the following:

  • Automatic detection and configuration of GPUs
  • Automatic compute mode selection
  • Automated configuration of CUDA_VISIBLE_DEVICES
  • Full isolation and access control
  • GPU power management and auto-boost support
  • GPU fairshare and GPU preemptive scheduling
  • Integrated metric collection and reporting
  • Integrated NVIDIA DCGM, MPS support
  • Dynamic workload-driven NVIDIA Multi-Instance GPU support

IBM Spectrum LSF Suite is certified as part of the DGX-Ready Software program, ensuring a proven, validated solution for demanding commercial HPC environments running NVIDIA DGX systems. Learn more about IBM Spectrum LSF Suites here.

Be the first to hear about news, product updates, and innovation from IBM Cloud