How can businesses scale AI? As KubeCon + CloudNativeCon kicks off this week, IBM Distinguished Engineer Carlos Costa explains IBM’s approach to cloud-native, open-source AI—and harnessing the power of Kubernetes, PyTorch and Ray.
When it comes to production, it's more than just scaling prototypes. We had to ensure compliance with governance guidelines and integrate with existing data lakes and applications. Ray was critical for scaling data preprocessing, but it wasn’t the entire solution. We used PyTorch for the heavy training tasks, and underneath it all, Kubernetes helped with cost optimization, which is a huge concern these days.
The cost per inference is incredibly high, and there’s intense competition to provide the cheapest inference endpoints. By managing GPUs efficiently, scaling up and down based on demand, we found real innovation at the Kubernetes layer. Ray helped with scaling data preprocessing, but it wasn’t the full picture. We needed the best-in-class tools for each step of the workflow—like using PyTorch for super training, Ray for preprocessing and Kubernetes services underneath to optimize things like cost, which is a huge concern right now.
More companies are competing on having the cheapest inference endpoints, so optimizing costs, especially at the Kubernetes layer, is key. Managing GPUs, scaling up and down based on demand—that’s where the real innovation is happening.
Yes, exactly. It’s about multi-layer optimization. For example, deciding what resources are right for the model size. We have multiple models to serve, and we need to make decisions in real time: for example, to use a full GPU or a partition of one, and adjust the size based on demand. On a typical day, 100 users ask for a model like Granite 13b, but during an event like the US Open, usage shifts. You have to reconfigure your cluster dynamically.
One of the innovations we introduced is called “InstaSlice,” an open-source project for dynamic GPU partitioning, which we’re trying to integrate into Kubernetes. Other things include request routing based on demand and optimizing model loading to reduce cold starts.
There are two angles. First, we want to minimize the cost of running the platform so we can make it profitable. Second, it’s about supporting different classes of users. Right now, many services offer the same service level to all users. But what if you could guarantee premium service with faster responses for certain industries? This would allow us to create different service tiers. Some clients might need a response in 30 minutes and would pay more for that, while others are fine with a slower, free service. Our innovations help us offer these differentiated tiers.
We’re building on what we saw at the last KubeCon. The Kubernetes community is realizing that AI workloads are becoming attached to every single application running on Kubernetes, but Kubernetes wasn’t initially built with AI in mind. The big challenge is how to handle GPUs, which aren’t first-class citizens in Kubernetes right now. At [this year’s] KubeCon, we’ll be showcasing our innovations in things like band scheduling and GPU fractioning, all geared toward making Kubernetes more AI-friendly. IBM has a strong point of view on how Kubernetes should evolve for AI, and we’re signaling this to the community.
That’s a great question. Open source is crucial because many of the industry’s pain points are shared, and there’s a huge multiplier effect when we collaborate. For example, no one today builds an operating system from scratch—we all use Linux and build on top of that. The same goes for Kubernetes. Standardization allows us to innovate faster. Open-source communities like Ray and PyTorch are growing and becoming stable, which is great for us because it means we don’t have to reinvent the wheel. Instead, we can build specialized solutions on a strong foundation. If Ray, PyTorch and Kubernetes succeed, that’s great for our business.
I think the train has already left the station. Open-source models will continue to grow. The challenge now is figuring out the business model behind them. For example, some companies release open-source models to gain credibility. If I tell a client I have a great AI platform, that’s one thing; but if I say, “We use this platform to train our own open-source model,” it carries more weight. Databricks, MosaicML and others are doing the same. While the massive models from OpenAI are impressive, smaller, open-source models that can be fine-tuned for specific tasks are often more efficient and easier to operationalize in production.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com