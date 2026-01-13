Nearly a year after DeepSeek’s low-cost, high performance R1 model rocked both Silicon Valley and Wall Street, the Chinese AI lab is poised to shake up the AI industry once more. This time, DeepSeek has released a new framework that could make the training of large language models (LLMs) much more efficient, stable and scalable. Perhaps most importantly, this brings down the cost of pretraining, which unlocks the power of LLMs to smaller companies and individual developers.

“With this innovation, DeepSeek is saying ‘how do I get more bang for my buck during pretraining?’” said IBM Distinguished Engineer Chris Hay in an interview with IBM Think. “Model training is the expensive part.”

DeepSeek researchers tested this new architecture, called Manifold-Constrained Hyper-Connections (mHC), on models with three billion, nine billion and 27 billion parameters. They found the models scaled without adding significant computational burden or instabilities, both of which usually increase in tandem with scaling.