August 9, 2022 By Gabor Samu 3 min read

Across many industries, organizations rely on high performance computing (HPC) to drive their core mission, delivering higher-quality, innovative products to market faster.

As we forge ahead in the era of artificial intelligence (AI), organizations are looking to leverage AI technologies to accelerate time to market and business decisions. AI is often seen as something independent from traditional HPC workloads. However, the reality is that AI methods are increasingly being applied in HPC, with AI-infused and AI-guided HPC being two such techniques.

Joining forces: HPC and AI

Whether for AI-infused or AI-guided HPC, data is the common denominator in the race to deliver higher-quality products to market faster. Organizations today have the ability to capture vast volumes of data from a variety of sources, including sensor and IoT data. Furthermore, organizations often have a wealth of data that has been acquired over decades from running traditional HPC simulation and modeling workloads. Using these sources of data, HPC and AI can be applied to the same problems to deepen insights and innovation.

Let’s consider some examples where AI-infused and AI-guided HPC are being used to tackle problems faster and with greater accuracy ever. AI-infused HPC involves applying AI methods to analyze the output of simulations. AI-guided HPC is the application of AI to reduce the problem space for HPC simulations. This is sometimes referred to as intelligent simulation, where Bayesian methods are applied.

AI-infused HPC in electronic design automation (EDA)

As part of modern semiconductor design, billions of verification tests must be run to validate chip designs. For example, the IBM POWER10 processor has 18 billion transistors. Typically, semiconductor-design companies must book time at a foundry for production far ahead of time. If an error is found during the validation phase, it’s not practical to re-run the entire set of billions of verification tests. Using AI-infused HPC methods can help identify the tests that need to be re-run, thereby saving a significant amount of compute cycles and helping to keep the manufacturing timelines on track.  

AI-guided HPC in automotive design

In the automotive industry, the design of vehicles and components often evolve from previous designs. During the design process of a new model, there are millions of potential changes and optimizations which can be considered to improve characteristics like aerodynamics, noise, vibration and harshness (NVH) and structural stiffness, just to name a few. However, assessing all these potential changes over different road conditions and parameters would significantly increase the cycle time between models. Automotive manufacturers have significant bodies of knowledge about existing designs, and they are exploring how to train AI models based upon these large bodies of data in order to rapidly determine the best areas for vehicle optimization. This approach helps to significantly reduce the problem space and allows manufacturers to focus traditional HPC methods on more targeted areas of the design. Ultimately, the goal is to produce a better-quality product, in a shorter amount of time.

The challenge of scale

As organizations scale up AI environments in support of their HPC practice, those environments closely resemble those used for traditional HPC workloads. High-speed interconnects, accelerators and high-performance parallel filesystems are de rigeur for both AI and HPC workloads. Due to the coupled nature of AI-infused and AI-guided HPC workloads, organizations are looking at ways to run these workloads on a common infrastructure to capitalize on economies of scale.

Furthermore, new classes of users are now appearing alongside traditional HPC users, including data scientists, engineers and researchers. Often, these users of modern HPC environments are domain experts and not IT or HPC infrastructure experts. Therefore, organizations are always looking for ways to make it easy for users to run their work and get results, while ensuring that the compute resources including GPUs are well utilized.

Managing a converged infrastructure for HPC and AI

IBM Spectrum LSF Suite is a high-performance, highly scalable workload and resource management solution for demanding HPC environments. LSF Suite helps to simplify the user experience and improve utilization in HPC environments for traditional HPC simulation and modeling, virtual engineering, digital twins and AI-infused and AI-guided HPC. LSF Suites supports workloads on-premises, in the cloud and hybrid cloud, as well as support for containerized, GPU, machine learning and deep learning workloads.

Advanced support for NVIDIA GPUs in LSF Suite means that organizations can drive utilization by simplifying administration with powerful features, including the following:

  • Automatic detection and configuration of GPUs
  • Automatic compute mode selection
  • Automated configuration of CUDA_VISIBLE_DEVICES
  • Full isolation and access control
  • GPU power management and auto-boost support
  • GPU fairshare and GPU preemptive scheduling
  • Integrated metric collection and reporting
  • Integrated NVIDIA DCGM, MPS support
  • Dynamic workload-driven NVIDIA Multi-Instance GPU support

IBM Spectrum LSF Suite is certified as part of the DGX-Ready Software program, ensuring a proven, validated solution for demanding commercial HPC environments running NVIDIA DGX systems. Learn more about IBM Spectrum LSF Suites here.

More from Announcements

IBM Consulting augments expertise with AWS Competencies: A win-win for clients 

3 min read - In today's dynamic economic landscape, businesses demand continuous innovation and speed of execution. At IBM Consulting®, our unwavering focus on partnerships and shared commitment to delivering enterprise-level solutions to mutual clients have been core to our success.   We are thrilled to announce that IBM® has recently gained five competencies from Amazon Web Services (AWS) in vital domains including Cloud Operations, Internet of Things (IoT), Life Sciences, Mainframe Modernization, and Telecommunications. With these credentials, IBM further establishes its position as a…

Probable Root Cause: Accelerating incident remediation with causal AI 

5 min read - It has been proven time and time again that a business application’s outages are very costly. The estimated cost of an average downtime can run USD 50,000 to 500,000 per hour, and more as businesses are actively moving to digitization. The complexity of applications is growing as well, so Site Reliability Engineers (SREs) require hours—and sometimes days—to identify and resolve problems.   To alleviate this problem, we have introduced the new feature Probable Root Cause as part of Intelligent Incident…

Reflecting on IBM’s legacy of environmental innovation and leadership

4 min read - Upholding a legacy of more than 50 years of environmental responsibility through our company’s actions and commitments, IBM continues to be a leader in driving sustainability for our business, our communities and our clients—including a 34-year history of annual, public environmental reporting, which we continue today. As a hybrid cloud and artificial intelligence (AI) company, we believe that leveraging technology is key to unlocking impact, and it will play a substantial role in how society addresses, adapts to, and overcomes…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters