Optimizing GPU resources for performance and efficiency 

21 May 2024

3 min read

As the demand for advanced graphics processing units (GPU) from vendors like NVIDIA® grows to support machine learning, AI, video streaming and 3D visualization, safeguarding performance while maximizing efficiency is critical. And with the pace of progress in AI model architecture rapidly accelerating with services like IBM watsonx™, the use of large language models (LLMs) that require advanced NVIDIA GPU workloads is on the rise to meet performance requirements. With this comes new concerns over costs and proper provisioning to ensure performance. 

IBM Turbonomic® is excited to announce the latest capability to optimize NVIDIA GPU workloads in the public cloud, on prem and on containers to improve efficiency without sacrificing performance. The benefits for customers include enhancing performance optimization to promote faster response, smoother experience and better efficiency by addressing resource waste to potentially keep costs down.

Turbonomic can now monitor and optimize GPU on prem and on cloud, and coming in June, on containers. The more permissions Turbonomic is granted for observation, the more optimizations it can drive. With the recent release of IBM Turbonomic 8.12.0, we can now monitor and optimize GPU on prem and on cloud.

On cloud

Developers may find it difficult to decide which GPU cloud instances would serve them best and, in most cases, they could end up over-provisioning. We have seen some GPU instances costing more than USD 100 a day, over-provisioning can result in a steep increase in their cloud bill.

Turbonomic enables users to scale GPU instances on demand to the instance type optimized to improve for efficiency and cost control. Currently, Turbonomic supports P2, P3, P3dn, G3, G4dn, G5, and G5g instance types on AWS. You can view these metrics in the Capacity and Usage and Multiple Resources charts.

On premises

The usage of GPUs is increasingly prevalent, especially in virtual machine (VM) environments. We’re seeing that it’s becoming common to configure VMs with virtual GPUs (vGPUs) to leverage their powerful processing capabilities.

Turbonomic’s current VM placing actions feature now identifies NVIDIA GPUs installed on both the source and the destination hosts as well the NVIDIA vGPU types assigned to VMs. Turbonomic makes suggestions on where to place the VM provided it supports compatible NVIDIA GPU cards and GPU types. Turbonomic will also recognize VMs with Passthrough GPUs attached to them and block them from move actions.

On containers

Generative AI (gen AI) and LLM workloads can require immense GPU processing power to operate at efficient levels of performance. Turbonomic was engineered to optimize GPU resources to make sure gen AI workloads meet performance standards while addressing efficiency in resource optimization and cost.

Turbonomic is committed to developing GPU optimization services to provide performance insights and generate actions to achieve application performance and efficiency targets. Turbonomic has developed the capability to scale up and down containers serving gen AI models according to their respective waiting queue sizes.

IBM Turbonomic’s new GPU optimization features combined with the new IBM Instana® technology to observe gen AI, is designed to provide efficiency and performance for customers leveraging GPU for LLMs. For more information or to be considered for our current containers preview, book a meeting with one of our Turbonomic specialists today.

 

Author

AJ Nish

Head of Product Management, IBM Turbonomic

Footnotes

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only