Artificial intelligence has reached a crucial milestone, where training large language models (LLMs) is one of the most computationally demanding tasks. High-performance computing is essential for generative AI (gen AI) and LLM workload optimization, and Graphics Processing Units (GPUs) can be expensive and scarce. GPUs are specialized computer chips designed to handle complex mathematical calculations and parallel processing, making them ideal for complex computations required in training and inference for deep learning models. As a result, GPUs are in high demand, and optimizing their utilization is critical for AI success.

The IBM® Big AI Models (BAM) team, that supports the primary research and development environment for engineering teams to test and refine their gen AI projects, saw an opportunity for improvement. As more projects went through the testing stage, the team recognized the importance of optimally using each instance to avoid resource wasting.