Artificial intelligence has reached a crucial milestone, where training large language models (LLMs) is one of the most computationally demanding tasks. High-performance computing is essential for generative AI (gen AI) and LLM workload optimization, and Graphics Processing Units (GPUs) can be expensive and scarce. GPUs are specialized computer chips designed to handle complex mathematical calculations and parallel processing, making them ideal for complex computations required in training and inference for deep learning models. As a result, GPUs are in high demand, and optimizing their utilization is critical for AI success.
The IBM® Big AI Models (BAM) team, that supports the primary research and development environment for engineering teams to test and refine their gen AI projects, saw an opportunity for improvement. As more projects went through the testing stage, the team recognized the importance of optimally using each instance to avoid resource wasting.
To optimize their GPU resources and manage their LLM Kubernetes instances, the IBM BAM team deployed IBM Turbonomic®, an advanced application resource management software tool. Using real-time data, Turbonomic generated AI-driven recommendations for automated actions to optimize resource utilization and efficiency. By identifying optimal resource allocation strategies, the solution generated tailored suggestions that the team could configure to execute automatically, enabling AI-driven resource optimization.
As an IBM internal software explicitly tailored for optimizing hybrid cloud management, including containerized applications, virtual machines and public clouds, IBM Turbonomic provided seamless integration within the existing infrastructure.
Tom Morris, AI Platform Researcher, summarizes: “Enabling Turbonomic to scale up and down our LLM inference servers has allowed me to spend less time monitoring performance.”
BEFORE
AFTER
With Turbonomic, the IBM BAM team was able to create a scalable and agile infrastructure that could adapt to the evolving demands of their business, supporting their LLM services and running over 100 NVIDIA A100 GPUs.
By scaling down overprovisioned instances, the team show the ability to increase the idle GPU resources from 3 to 16 (5.3 times), to allow those resources to handle additional workloads.
The results included:
Applying Turbonomic automation capabilities, the IBM BAM team successfully scaled and optimized LLM services. This improvement positioned the team to reallocate their time for strategic projects.
The IBM Big AI Models (BAM) team is a group of researchers and engineers within IBM Research® that focuses on developing and applying large-scale AI models. These models are designed to process and analyze vast amounts of data, enabling applications such as natural language processing, computer vision and predictive analytics.
© Copyright IBM Corporation 2024. IBM, the IBM logo, Turbonomic, and IBM Research are trademarks or registered trademarks of IBM Corp., in the U.S. and/or other countries. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.
Client examples are presented as illustrations of how those clients have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.