Why the next AI revolution may be inside the mainframe

05 June 2025

Author

Sascha Brodsky

Tech Reporter, Editorial Lead

IBM

The air inside IBM’s Poughkeepsie facility smells faintly of ozone and cold metal. Rows of matte-black server racks hum behind glass doors, lights blinking in a steady, algorithmic rhythm. Somewhere among them stands a machine that looks unassuming but represents a leap forward in the practical deployment of artificial intelligence: the new IBM z17 mainframe.

Artificial intelligence has come to dominate the conversation in tech, but the discourse is often abstract. The hard part isn’t training models. It’s putting them to work—securely, consistently and at scale. That’s what the z17 is built to do.

“You don’t always have to bring your data to AI,” says Ross Mauri, General Manager of IBM Z and LinuxONE, in an interview with IBM Think. “With z17, you can bring AI to your data, where it lives, with the high availability, performance and security you expect from IBM Z.”

The z17 is the product of five years of development, hundreds of client interviews and more than 300 patent filings. It is a machine designed not for flash but for function. According to IBM, it can perform up to 450 billion inference operations per day and do so inside the world’s most sensitive and demanding computing environments.

AI, running live

Inside a large European bank, a transaction begins—someone is transferring €9,500 from an account flagged months ago for unusual activity. Traditionally, this might trigger a rules-based alert. On the z17, three AI models are called in simultaneously: a predictive model trained on the institution’s own data, a transformer model based on global fraud patterns and a lightweight generative model trained to explain discrepancies. The decision to block, flag or allow the transfer happens in under a second, with no human intervention.

This is the promise of what IBM calls multi-model inference. The z17’s Telum II processor includes a second-generation on-chip AI accelerator. Any core can now access any AI unit across the processor drawer, making it possible to run several models in parallel, all without compromising service-level agreements.

“It’s not just about speed,” Mauri says. “It’s about doing more within the scope of a transaction … where milliseconds matter.”

Model coordination is particularly crucial in sectors such as insurance, where AI must balance risk, compliance and customer experience in real time. An underwriter flagging a suspicious pattern doesn’t just want one model’s opinion—they want corroboration, nuance and auditability. The z17’s ability to run multiple models simultaneously provides exactly that.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

AI is not an add-on

The AI integrations are not retrofits. IBM designed the z17 to incorporate both predictive and generative models into its core architecture. Tools like watsonx Code Assistant and watsonx Assistant for Z can now run natively, helping developers write COBOL updates or resolve performance issues using live data.

In one early deployment, an IT operations team used watsonx Assistant to identify the root cause of a recurring bottleneck, then generated code suggestions to automate the fix. All of it happened without transferring logs to a third-party platform.

Generative AI is also coming to the hardware layer. The IBM Spyre Accelerator, due later this year, adds 32 AI-optimized cores to support large language models on the mainframe itself. That means enterprises can run generative models next to sensitive customer data, without the risk of moving it to a public cloud.

“Clients told us they didn’t want to send their business logic and sensitive data off-platform,” Mauri says. “This keeps it all in one secure stack.”

Built for constraints

In Frankfurt, a data center manager for a multinational bank walks a tightrope. The facility’s power draw is maxed out, and European energy prices mean any increase must be justified to the board. But the bank wants to expand its AI usage.

That’s where the z17’s energy profile becomes a deciding factor. IBM claims the system uses 17% less energy than its predecessor and can replace hundreds of x86 servers while consuming only a quarter of the power.

In cities like London and Tokyo, where square footage is expensive and tightly regulated, clients plan to consolidate x86 server racks into z17 systems to free up space and cooling capacity. IBM says some clients have seen power savings as high as 75% compared to legacy architectures.

“Some clients are constrained by floor space, others by electricity,” Mauri states. “This gives them plenty of breathing room.”

Inside the machine

Physically, the z17 stands over six feet tall and is encased in a matte-black chassis, with angular blue and silver inlays that slice through the otherwise austere geometry. A stylized “Z” is embossed discreetly near the top panel, and the front face bears four recessed panels with geometric, prism-like patterns—blades of cobalt, silver and black tucked into trapezoidal recesses. It feels more like a stealth object than a server, all sharp lines and quiet menace. Along one edge, the word “IBM” is etched vertically in bold, metallic text, casting a faint industrial shimmer under overhead light.

But inside, it is filled with architectural changes. The system includes a data processing unit (DPU) that moves I/O directly onto the chip. Virtual caching layers stretch across levels to reduce latency.

The hardware reflects what Mauri describes as a “reengineering of the stack.” Rather than focusing on raw performance alone, the z17 is designed to maintain consistency under pressure, to run hotter workloads without degrading uptime or responsiveness.

At the Hot Chips conference, IBM’s Telum II processor design drew praise for its integrated AI accelerators and memory architecture. The message was clear: IBM wasn’t just building another machine. It was rethinking what a mainframe could be in an AI-first era.

Quiet intelligence

One of the more subtle but significant features of the z17 is its self-management. The system uses its own AI capabilities to monitor logs, detect anomalies and even recommend or implement fixes. Tools like IBM Concert for Z pull logs via OpenTelemetry and correlate them with performance metrics.

This kind of automation is increasingly important as the generation of mainframe engineers retires. Companies are finding it harder to hire teams fluent in decades-old systems. By embedding AI into operations, IBM aims to simplify management without compromising its capabilities.

A forthcoming release of the z/OS operating system will add support for hybrid data workflows, NoSQL access and expanded operational AI.

In practice, this might look like an assistant that flags a memory leak, traces it to a recent deployment and suggests a rollback—all before end users experience slowdowns. It’s not glamorous. But it’s the kind of help that keeps systems alive.

Mauri says more than 250 AI use cases are now running on Z, though client names remain mostly confidential. One large healthcare organization is using it to screen prostate cancer images, speeding up diagnosis and reducing the workload on radiologists. A European environmental agency is analyzing satellite data to monitor illegal construction near wetlands. A global retailer is using the system to detect and block fraudulent transactions as they happen.

In one use case described to IBM by a US-based bank, z17’s multi-model inference reduced false positives on fraud detection by over 30% without compromising speed or accuracy. The models worked together to weigh data points from user behavior to merchant risk to device telemetry.

These are not research pilots. They are AI implementations embedded in the middle of mission-critical workflows.

Security in a quantum future

Quantum computing may be years from breaking encryption standards, but IBM is betting that clients want to prepare now. The z17 includes quantum-safe cryptography certified by NIST, protecting not only data in motion but also the system firmware itself.

IBM Vault provides centralized management of security credentials, while new AI tools help classify sensitive data using natural language models. For banks, hospitals and governments, this isn’t theoretical. It’s regulatory.

Clients are now beginning to inventory their cryptographic infrastructure, Mauri says, recognizing that retrofitting for quantum safety later could be costly or impossible. “We’re helping clients avoid another Y2K-like moment they can’t see coming yet,” he says.

Mauri’s team is already working on the next three generations of IBM Z. That roadmap is a reassurance to clients making multi-million-dollar bets on the future.

“Every client I met with at Think 2025 asked the same question: how long is IBM staying in this business?” Mauri says. “We’re here for the long haul and have a clear 10-year roadmap in place. I am confident this business will be here for decades to come.”

In an industry often driven by quarterly metrics and ephemeral launches, the z17 represents something more deliberate. Its value isn’t just in what it does now, but in how it has been built to evolve over time, to serve as a foundation rather than a short-term fix.

The companies deploying z17 aren’t interested in hype. They want infrastructure that works, scales and endures.

“This is not just about modernizing a system,” Mauri says. “It’s about building a future we can trust to run the world’s economy.”

Mixture of Experts | 20 June, episode 60

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Related solutions
IBM watsonx.ai

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Discover watsonx.ai
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Get one-stop access to capabilities that span the AI development lifecycle. Produce powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.

Explore watsonx.ai Book a live demo