Diffusion models challenge GPT as next-generation AI emerges

Published 10 March 2025

Back of person's head looking on computer screen while they program

Author

Staff Writer

IBM

A new class of AI models is challenging the dominance of GPT-style systems, promising faster, cheaper and potentially more powerful alternatives.

Inception Labs, a startup founded by researchers from Stanford, recently released Mercury, a diffusion-based language model (dLLM) that refines entire phrases at once, rather than predicting words one by one. Unlike traditional large language models (LLMs), which use an autoregressive approach—generating one word at a time, based on the preceding text—diffusion models improve text iteratively, through refinement.

“dLLMs expand the possibility frontier,” Stefano Ermon, a Stanford University computer science professor and co-founder of Inception Labs, tells IBM Think. “Mercury provides unmatched speed and efficiency, and—by leveraging more test-time compute—dLLMs will also set the bar for quality and improve overall customer satisfaction for edge and enterprise applications.”

IBM Research Engineer Benjamin Hoover sees the writing on the wall: “It’s just a matter of two or three years before most people start switching to using diffusion models,” he says. “When I saw Inception Labs’ model, I realized, ‘This is going to happen sooner rather than later.’”

The diffusion model advantage

Diffusion models don’t play by the same rules as traditional AI. Autoregressive models like GPT build sentences word by word, predicting one token at a time. If a model is generating the phrase “To whom it may concern,” it predicts “To,” then “whom,” then “it,” and so on—one step at a time. Diffusion models flip the script. Instead of piecing together text sequentially, they start with a rough, noisy version of an entire passage and refine it in multiple steps. Think of it like an artist sketching a rough outline before sharpening the details, rather than drawing each element in order. By considering the whole sentence at once, diffusion models can generate responses faster, often with more coherence and accuracy than traditional LLMs.

Hoover sees the technology as a modern twist on an older concept. “Diffusion models are fundamentally error-correction mechanisms,” he says. “They work by starting with a noisy input, and gradually removing the noise until they arrive at the desired output.”

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Breaking through the language barrier

Diffusion models have been widely used in image generation, with models like DALL·E, Stable Diffusion and Midjourney refining noisy images into high-quality visuals. However, applying this approach to text is more difficult because language requires strict adherence to grammar and syntax.

“Many attempts to apply diffusion models to text generation have struggled in the past,” says Ermon. “What enabled Mercury to succeed where others failed are proprietary innovations in both training and inference algorithms.” Unlike images, which can be gradually cleaned up into recognizable forms, language follows rigid grammatical rules that make iterative refinement trickier.”

Hoover points to Inception Labs’ Mercury as a prime example of how diffusion models are closing the gap. “That model proved diffusion could hold its own and is actually faster and more efficient than comparable autoregressive models.”

The future of diffusion

The efficiency of diffusion-based LLMs could shake up AI deployment, particularly in enterprise applications where cost and speed matter. Traditional LLMs require substantial computing power, making them expensive to run. Diffusion models promise to deliver similar or better performance at a fraction of the cost. Diffusion models are often more efficient because they refine entire sequences in parallel rather than generating each word step by step like traditional LLMs, reducing computational overhead.

“Our customers and early adopters are developing applications powered by dLLMs in areas including customer support, sales and gaming,” Ermon says. “They’re making their applications more responsive, more intelligent and cheaper.”

Hoover sees an even broader impact. “Right now, AI is constrained by energy consumption,” he says. “Large models use enormous amounts of power. However, diffusion models operate differently, allowing for far greater efficiency. In the long run, we could see diffusion-based AI systems running on analog hardware, reducing energy costs dramatically.”

Analog computing, which processes information using continuous electrical signals rather than binary operations, has long been touted as a potential solution to AI’s energy problem. Hoover believes diffusion models are particularly well-suited to this approach.

“These models are inherently interpretable,” he says. “That means we can map their internal computations directly onto analog circuits, something that’s much harder to do with traditional deep learning architectures.”

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Watch the series

Start realizing ROI: A practical guide to agentic AI

Discover ways to get ahead, successfully scaling AI across your business with real results.

Resources

Take your gen AI skills to the next level

Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.

Put AI to work: Driving ROI with gen AI

Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to applying AI to drive transformation at the core.

The CEO's guide to generative AI

Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.

watsonx Developer Hub

Support your next project with some of our most commonly used capabilities. Get started and learn more about the supported models that IBM provides.

The truth about successful generative AI

Uncover the benefits of AI platforms that enable foundation model customization through technology, processes and best practices, to help you easily operationalize the gen AI lifecycle.

IBM is named a leader in data science and machine learning

Learn why IBM has been recognized as a leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

Explore IBM Granite

IBM Granite is our family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

How to thrive in this new era of AI with trust and confidence

Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Diffusion models challenge GPT as next-generation AI emerges

Author

The diffusion model advantage

The latest AI News + Insights

Breaking through the language barrier

The future of diffusion

Become an AI expert

Share

Resources

The latest AI News + Insights