If you read the headlines on January 20, 2025, you might have thought the sky was falling. That’s because China-based DeepSeek released its R1 large language model (LLM), which quickly became one of the most downloaded and active models shortly after its release.
What set off the excitement was the fact that the Hangzhou, China-based AI research lab, which releases models under its name, built a model at a far lesser cost—USD 5.6 million —and with far fewer compute resources and access to NVIDIA chips than the leading US models.
Like clockwork, people openly worried that some of the heavily funded US AI firms were about to be left behind. Since DeepSeek used less NVIDIA chips than those other firms, the company’s stock price dropped. However, that was more of a knee-jerk reaction to the news than anything materially worrisome about the chipmaker’s fortunes.
Tech and business reporters viewed this news as a shock to the system. However, for other AI experts and me, the only surprise from DeepSeek’s R1 announcement was how surprised everyone seemed to be.
While the model was new, DeepSeek is far from a new entrant into the marketplace. It has an ample history of producing valuable open source models in the Chinese market, especially the V3 model released in December. In fact, it released an accompanying technical paper1, which offers an education for anyone who wants to go deep dive into how to build these labs. The V3 model was more of a surprise, but that apparently flew under the radar.
DeepSeek's R1 model, of course, is another example of a generative AI tool that can become the basis for the agentic AI future, where AI tools not only respond to their users' requests but work independently to provide services to those users.
While IBM by design partners with and uses all of these models, we are also big advocates and engineers of the open source movement. Seeing an open source model like R1 receive much-deserved praise is great for the industry.
It’s understandable that it was a bit jarring for the big players to see DeepSeek produce a model on par or better than their models yet built for a fraction of the cost of more well-known models. However, that is what the open source community is designed to do.
The DeepSeek R1 announcement demonstrates a tale of two worlds: the financial markets projected turmoil while AI experts were excited about the technological breakthrough and how it could inform more efficient and powerful newer models.
R1 only reinforced what many have known, and the rest of the world is catching up on. DeepSeek is obviously standing on the shoulders of all of whom contribute to the open source environment, including IBM, Meta, and more. Open source models will continue to lead innovation. While R1 was an initial shock to the system, all will benefit from its existence. Especially considering DeepSeek just announced an Open-Source Week, where it was sharing one open source repo a day.
DeepSeek R1 uses the Mixture of Experts (MoE) machine learning approach that divides an artificial intelligence (AI) model into separate sub-networks (or “experts”), each specializing in a subset of the input data, to jointly perform a task.
So, when you use the MoE approach, not all the parameters in the model must be activated at the same time. As an example, there are about 671 billion parameters in DeepSeek’s v3 or R1 model, but only 37 billion parameters are active at a time. So, the very small portion of the entire model that's really answering the question makes it a lot more efficient.
Historically, researchers have encountered training difficulties with MoE models. DeepSeek came up with some novel techniques of fixing those issues while maintaining the overall workload that made their mixture of experts moderate and efficient.
For example, the V3 and R1 models used reinforcement learning instead of depending on labeled data. This technique thinks through various routes to end up at the answer. Each route that it traverses, it reassesses along the way. Therefore, it more quickly determines whether it is going down the wrong path. Then, it can quickly backtrack and determine a potentially more advantageous route.
This “chain of thought” reasoning helps it find its way to the final destination, which is accurate, and get the reward for it. This reinforcement learning methodology helped them train the model to perform at the same level or above OpenAI's and other models.
Sometimes limitations breed innovation. DeepSeek is limited in what NVIDIA chips that it can acquire because of US export controls on chip sales to China. The parent company obviously had a significant number of NVIDIA chips on hand—2,000 of NVIDIA's H800 chips—but it still had to be nimble in how it deployed them. It conducted some incredible work down to the hardware level to be able to drive some optimizations.
Everyone in the open source community uses NVIDIA's Cuda platform, which makes available a good set of libraries that can help you connect all the different GPUs together so they can communicate more efficiently, distribute their workload, and so forth. But DeepSeek went one step deeper, below the library, and further optimized the hardware as well.
The reality is that the pace at which open models have improved and will continue to improve is phenomenal.
AI doesn’t happen without chips. The initial news that it may require fewer chips in the future to produce excellent models created, in some industry watchers, a logical fallacy that chip demand would wane. According to Jevons Paradox, the opposite is true: increased efficiency often leads to increased consumption. From fuel and energy usage throughout time and increases in air conditioning efficiency leading to people building bigger homes, there’s never too much of a good thing.
Take, for example, the global whiskey business. In recent years, the rise of independent and small-batch distilleries has only increased the demand for grain. It’s the same in any industry as economics improves the opportunities for small companies. There may be fewer chips used by any given company, but DeepSeek demonstrated that many more players can enter the market and use open source techniques to build impressive models for less.
This, to me, is the greatest takeaway. What it unlocks is that it won’t just be the very elite that have access to incredible compute who will be able to build the next series of models. Maybe there are alternate routes where smaller labs can also start investing in building some more models. That's a great thing for those excited for AI agents and the agentic future we all anticipate.
Competition among all the major players will ebb and flow, so it’s best to not think about winners and losers in the immediate term. Every day, companies, researchers, and AI scientists are innovating to produce better models based on more scientific reasoning.
That is why we’re so excited about our recent reasoning updates to our Granite family of LLMs, which have outperformed R-1 performance on benchmarks like ArenaHard and AlpacaEvaOur reasoning models combine the best of both worlds: high performance with safety characteristics, while letting users choose whether they want to use reasoning capabilities or not, depending on the situation. The more we share what we know and open source what is possible to do so will benefit everyone, most importantly consumers.
While OpenAI and others may feel some initial heat from the rise of smaller, but potent competition, this is a big win for the community and aligns with IBM’s perspective for the future of AI. It’s a huge win for the open source community and demonstrates that smaller models can outcompete some of the others. Obviously, this by no means counts out the bigger players; if they’re smart, they will use what DeepSeek taught them to continue to build bigger models at lower costs.
But, ultimately, competition is great for enterprises and consumers alike. Everybody wins when we have these seismic events like DeepSeek R1.
