Artificial intelligence has reached a crucial milestone, where training large language models (LLMs) is one of the most computationally demanding tasks. High-performance computing is essential for generative AI (gen AI) and LLM workload optimization, and Graphics Processing Units (GPUs) can be expensive and scarce. GPUs are specialized computer chips designed to handle complex mathematical calculations and parallel processing, making them ideal for complex computations required in training and inference for deep learning models. As a result, GPUs are in high demand, and optimizing their utilization is critical for AI success.
The IBM® Big AI Models (BAM) team, that supports the primary research and development environment for engineering teams to test and refine their gen AI projects, saw an opportunity for improvement. As more projects went through the testing stage, the team recognized the importance of optimally using each instance to avoid resource wasting.
To optimize their GPU resources and manage their LLM Kubernetes instances, the IBM BAM team deployed IBM Turbonomic®, an advanced application resource management software tool. Using real-time data, Turbonomic generated AI-driven recommendations for automated actions to optimize resource utilization and efficiency. By identifying optimal resource allocation strategies, the solution generated tailored suggestions that the team could configure to execute automatically, enabling AI-driven resource optimization.
As an IBM internal software explicitly tailored for optimizing hybrid cloud management, including containerized applications, virtual machines and public clouds, IBM Turbonomic provided seamless integration within the existing infrastructure.
Tom Morris, AI Platform Researcher, summarizes: “Enabling Turbonomic to scale up and down our LLM inference servers has allowed me to spend less time monitoring performance.”
With Turbonomic, the IBM BAM team was able to create a scalable and agile infrastructure that could adapt to the evolving demands of their business, supporting their LLM services and running over 100 NVIDIA A100 GPUs.
By scaling down overprovisioned instances, the team show the ability to increase the idle GPU resources from 3 to 16 (5.3 times), to allow those resources to handle additional workloads.
The results included:
The IBM Big AI Models (BAM) team is a group of researchers and engineers within IBM Research® that focuses on developing and applying large-scale AI models. These models are designed to process and analyze vast amounts of data, enabling applications such as natural language processing, computer vision and predictive analytics.
