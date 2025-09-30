Tongyi DeepResearch’s appeal also stems from its ability to solve very specific problems, such as creating a detailed trip itinerary or a comprehensive legal research report, said Sandi Besen, an AI Research Engineer at IBM, on Mixture of Experts. The model “solves a very narrow piece of the puzzle,” she said. “I could very much see using this deep research agent as part of a broader agent team or broader agent architecture.”

Besen likened the arrival of the Alibaba agent to DeepSeek-R1, the model that rocked Silicon Valley and Wall Street in early 2025 because it met or surpassed many frontier models from OpenAI and Anthropic on certain benchmarks but reportedly cost a fraction of the price to build and use.

However, it was not just because of Tongyi’s efficiency that Besen compared it to DeepSeek-R1, but also because the latter catapulted a particular training technique into the public eye.

“Distillation became a big deal after the DeepSeek paper came out. I wonder whether this paper will trigger some sort of trend in terms of the triathlon of training where you do continual pre-training, then fine-tuning, and then on-policy RL [reinforcement learning],” said Besen.

It is exactly this combination of techniques that the developers drew attention to in their paper about Tongyi DeepResearch. “Overall, this pipeline marks a breakthrough: it connects pre-training to deployment without silos, yielding agents that evolve through trial–and–error,” the researchers wrote.

While it’s still too early to say what Tongyi DeepResearch’s impact will be, it’s possible its influence will extend beyond this specific model, said Besen. “Sometimes it’s not the first model that comes out that’s actually ‘the best,’ but it’s the trend that it drives that points us in a different direction.”