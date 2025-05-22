This week’s Red Hat Summit 2025 was packed with announcements and one broad vision: enabling AI across hybrid cloud. Red Hat unveiled a universal inference platform that, according to the open-source giant, will offer greater speed and efficiency, and be cost-effective to run any gen AI model in any environment.
After disrupting software with Enterprise Linux and hybrid cloud with OpenShift, Red Hat—an IBM company—now wants to help enterprises solve a new challenge: finding a balance between the investments they’ve already made in AI, while securing their future. “We realized that to be a platform company, we have to enable customers for what's coming next,” said Red Hat CEO Matt Hicks during the event.
To that end, Red Hat made several key announcements. Red Hat Enterprise Linux 10 will notably help enterprises be more resistant to cyberattacks while leveraging RHEL as a trusted AI foundation. But the biggest move is its step forward with the AI Inference Server, which essentially brings the vision behind hybrid cloud and applies it to AI.
“Red Hat AI Inference Server is a pre-built, fully supported Red Hat VLM container that gives users the ability to serve models anywhere, on any hardware,” said Brian Stevens, an SVP and AI CTO at Red Hat, in the Summit’s inaugural keynote. “AI Inference Server provides optimized gen AI inference to deliver faster, more cost-effective and scalable model deployments across the hybrid cloud.”
Red Hat Inference Server can be deployed as a standalone solution or as part of Red Hat Enterprise Linux AI or Red Hat OpenShift AI. Red Hat execs say the open-source giant wants to tackle the new challenges emerging in the era of agentic AI and reasoning models.
“Customers tell us their focus is on getting AI quickly into production,” Stevens said. “And in the AI world, production workloads mean inference, or the process of running a model to generate responses. This is where the real business value happens, with its ability to deliver fast, cost-effective and scalable responses.”
“This is gonna be big, and it’s gonna drive a lot of the sales,” Peter Staar, a Principal Research Staff Member, Master Inventor and Manager of the AI for Knowledge group at IBM Research Zurich, tells IBM Think. “This is an easy and reliable way you can now deploy agents in Llama Stack and provide stable, robust inference endpoints, which will simplify the rollout of new agents.”
Red Hat’s vision for the future of AI is open. That’s why it also unveiled llm-d, an open-source project for distributed inference. The platform is powered by a native Kubernetes architecture, vLLM-based distributed inference and intelligent, AI-aware network routing, empowering robust large language model (LLM) inference clouds.
“I think Red Hat’s launch of llm-d could mark a turning point in Enterprise AI,” wrote Armand Ruiz, a VP of AI Platform at IBM, on LinkedIn. “Locking into one vendor or architecture too early is risky. llm-d gives teams the flexibility to switch tools, test new tech and scale efficiently without re-architecting everything.”
With more enterprises now deploying agents, Red Hat is bringing Llama Stack and MCP to the Red Hat AI Platform to offer a unified platform for both agents and models. “This will bring agents to another level,” Staar says. “Red Hat is in a very nice position to build this future.”
