DeepSeek’s new architecture and why it matters

IBM Z inside for AI Accelerator page

Nearly a year after DeepSeek’s low-cost, high performance R1 model rocked both Silicon Valley and Wall Street, the Chinese AI lab is poised to shake up the AI industry once more. This time, DeepSeek has released a new framework that could make the training of large language models (LLMs) much more efficient, stable and scalable. Perhaps most importantly, this brings down the cost of pretraining, which unlocks the power of LLMs to smaller companies and individual developers.

“With this innovation, DeepSeek is saying ‘how do I get more bang for my buck during pretraining?’” said IBM Distinguished Engineer Chris Hay in an interview with IBM Think. “Model training is the expensive part.”

DeepSeek researchers tested this new architecture, called Manifold-Constrained Hyper-Connections (mHC), on models with three billion, nine billion and 27 billion parameters. They found the models scaled without adding significant computational burden or instabilities, both of which usually increase in tandem with scaling.

  1. The case for smarter design over parameter count

Typically, frontier AI labs rely on “brute force” to improve AI, said Kaoutar El Maghraoui, a Principal Research Scientist at IBM, on the latest episode of the Mixture of Experts podcast. That means “adding more data, more compute power, more parameters,” she said. But that approach is “increasingly inefficient and only affordable by a few large companies.”

El Maghraoui stressed that DeepSeek’s mHC architecture could revolutionize model pretraining. “It’s scaling AI more intelligently rather than just making it bigger,” she said. “It’s a smarter way of designing these models that would also work better for the hardware.” mHC can also integrate easily with a company’s custom hardware, said El Maghraoui, making it a potentially appealing option for enterprises looking for cost-efficient AI. As an example, she pointed to IBM’s specialized hardware accelerators, designed to speed up AI, machine learning and deep learning workloads for enterprise clients on premises.

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Large models made accessible

In a LinkedIn post, Pierre-Carl Langlais, Cofounder of French AI startup Pleias, suggested that the paper’s true significance goes beyond proving the scalability of mHC. The “actual flex” is DeepSeek’s ability to re-engineer every dimension of the training environment, he wrote. “That’s what makes [DeepSeek] a frontier lab.”

For Hay, the fact that DeepSeek keeps open sourcing its new work is notable because it makes AI more accessible to a broader audience. “I appreciate that they come up with innovations, open them up to the world, let people try [them] out, and then they bring the whole field along with them,” he said.

As AI leaders in smaller organizations navigate the complexities of implementing cost-efficient AI solutions, innovations like DeepSeek’s mHC framework make it easier for them to access powerful foundation models that were historically only available to companies with much bigger wallets. By significantly reducing the cost of pretraining LLMs and making AI more accessible, DeepSeek’s breakthroughs are poised to revolutionize the AI landscape for smaller and midsize companies.

Aili McConnon

Staff Writer

IBM

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
AI for developers

Move your applications from prototype to production with the help of our AI development solutions.

Explore AI development tools
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

Explore watsonx Orchestrate Explore watsonx.ai