Diffusion models challenge GPT as next-generation AI emerges

Back of person's head looking on computer screen while they program

Author

Sascha Brodsky

Staff Writer

IBM

A new class of AI models is challenging the dominance of GPT-style systems, promising faster, cheaper and potentially more powerful alternatives.

Inception Labs, a startup founded by researchers from Stanford, recently released Mercury, a diffusion-based language model (dLLM) that refines entire phrases at once, rather than predicting words one by one. Unlike traditional large language models (LLMs), which use an autoregressive approach—generating one word at a time, based on the preceding text—diffusion models improve text iteratively, through refinement.

“dLLMs expand the possibility frontier,” Stefano Ermon, a Stanford University computer science professor and co-founder of Inception Labs, tells IBM Think. “Mercury provides unmatched speed and efficiency, and—by leveraging more test-time compute—dLLMs will also set the bar for quality and improve overall customer satisfaction for edge and enterprise applications.”

IBM Research Engineer Benjamin Hoover sees the writing on the wall: “It’s just a matter of two or three years before most people start switching to using diffusion models,” he says. “When I saw Inception Labs’ model, I realized, ‘This is going to happen sooner rather than later.’”

The diffusion model advantage

Diffusion models don’t play by the same rules as traditional AI. Autoregressive models like GPT build sentences word by word, predicting one token at a time. If a model is generating the phrase “To whom it may concern,” it predicts “To,” then “whom,” then “it,” and so on—one step at a time. Diffusion models flip the script. Instead of piecing together text sequentially, they start with a rough, noisy version of an entire passage and refine it in multiple steps. Think of it like an artist sketching a rough outline before sharpening the details, rather than drawing each element in order. By considering the whole sentence at once, diffusion models can generate responses faster, often with more coherence and accuracy than traditional LLMs.

Hoover sees the technology as a modern twist on an older concept. “Diffusion models are fundamentally error-correction mechanisms,” he says. “They work by starting with a noisy input, and gradually removing the noise until they arrive at the desired output.”

小球在轨道上滚动的三维设计

最新的 AI 新闻 + 洞察分析

在每周的 Think 时事通讯中,发现专家精选的有关 AI、云等的洞察分析和新闻。 

Breaking through the language barrier

Diffusion models have been widely used in image generation, with models like DALL·E, Stable Diffusion and Midjourney refining noisy images into high-quality visuals. However, applying this approach to text is more difficult because language requires strict adherence to grammar and syntax.

“Many attempts to apply diffusion models to text generation have struggled in the past,” says Ermon. “What enabled Mercury to succeed where others failed are proprietary innovations in both training and inference algorithms.” Unlike images, which can be gradually cleaned up into recognizable forms, language follows rigid grammatical rules that make iterative refinement trickier.”

Hoover points to Inception Labs’ Mercury as a prime example of how diffusion models are closing the gap. “That model proved diffusion could hold its own and is actually faster and more efficient than comparable autoregressive models.”

The future of diffusion

The efficiency of diffusion-based LLMs could shake up AI deployment, particularly in enterprise applications where cost and speed matter. Traditional LLMs require substantial computing power, making them expensive to run. Diffusion models promise to deliver similar or better performance at a fraction of the cost. Diffusion models are often more efficient because they refine entire sequences in parallel rather than generating each word step by step like traditional LLMs, reducing computational overhead.

“Our customers and early adopters are developing applications powered by dLLMs in areas including customer support, sales and gaming,” Ermon says. “They’re making their applications more responsive, more intelligent and cheaper.”

Hoover sees an even broader impact. “Right now, AI is constrained by energy consumption,” he says. “Large models use enormous amounts of power. However, diffusion models operate differently, allowing for far greater efficiency. In the long run, we could see diffusion-based AI systems running on analog hardware, reducing energy costs dramatically.”

Analog computing, which processes information using continuous electrical signals rather than binary operations, has long been touted as a potential solution to AI’s energy problem. Hoover believes diffusion models are particularly well-suited to this approach.

“These models are inherently interpretable,” he says. “That means we can map their internal computations directly onto analog circuits, something that’s much harder to do with traditional deep learning architectures.”

AI 学院

成为 AI 专家

获取相关知识,以确定 AI 投资的优先级,从而推动业务增长。立即开始观看我们的免费 AI 学院视频,引领 AI 在组织中的未来应用。

相关解决方案
IBM watsonx.ai

使用面向 AI 构建器的新一代企业级开发平台 IBM watsonx.ai,可以训练、验证、调整和部署生成式 AI、基础模型和机器学习功能。使用一小部分数据,即可在很短的时间内构建 AI 应用程序。

了解 watsonx.ai
人工智能 (AI) 解决方案

借助 IBM 业界领先的人工智能专业知识和解决方案组合,让人工智能在您的业务中发挥作用。

深入了解 AI 解决方案
人工智能服务

通过增加 AI 重塑关键工作流程和运营,最大限度提升体验、实时决策和商业价值。

深入了解人工智能服务
采取后续步骤

一站式访问跨越 AI 开发生命周期的功能。利用用户友好型界面、工作流并访问行业标准 API 和 SDK,生成功能强大的 AI 解决方案。

深入了解 watsonx.ai 预约实时演示