Diffusion models challenge GPT as next-generation AI emerges

Back of person's head looking on computer screen while they program

Author

Sascha Brodsky

Staff Writer

IBM

A new class of AI models is challenging the dominance of GPT-style systems, promising faster, cheaper and potentially more powerful alternatives.

Inception Labs, a startup founded by researchers from Stanford, recently released Mercury, a diffusion-based language model (dLLM) that refines entire phrases at once, rather than predicting words one by one. Unlike traditional large language models (LLMs), which use an autoregressive approach—generating one word at a time, based on the preceding text—diffusion models improve text iteratively, through refinement.

“dLLMs expand the possibility frontier,” Stefano Ermon, a Stanford University computer science professor and co-founder of Inception Labs, tells IBM Think. “Mercury provides unmatched speed and efficiency, and—by leveraging more test-time compute—dLLMs will also set the bar for quality and improve overall customer satisfaction for edge and enterprise applications.”

IBM Research Engineer Benjamin Hoover sees the writing on the wall: “It’s just a matter of two or three years before most people start switching to using diffusion models,” he says. “When I saw Inception Labs’ model, I realized, ‘This is going to happen sooner rather than later.’”

The diffusion model advantage

Diffusion models don’t play by the same rules as traditional AI. Autoregressive models like GPT build sentences word by word, predicting one token at a time. If a model is generating the phrase “To whom it may concern,” it predicts “To,” then “whom,” then “it,” and so on—one step at a time. Diffusion models flip the script. Instead of piecing together text sequentially, they start with a rough, noisy version of an entire passage and refine it in multiple steps. Think of it like an artist sketching a rough outline before sharpening the details, rather than drawing each element in order. By considering the whole sentence at once, diffusion models can generate responses faster, often with more coherence and accuracy than traditional LLMs.

Hoover sees the technology as a modern twist on an older concept. “Diffusion models are fundamentally error-correction mechanisms,” he says. “They work by starting with a noisy input, and gradually removing the noise until they arrive at the desired output.”

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

Breaking through the language barrier

Diffusion models have been widely used in image generation, with models like DALL·E, Stable Diffusion and Midjourney refining noisy images into high-quality visuals. However, applying this approach to text is more difficult because language requires strict adherence to grammar and syntax.

“Many attempts to apply diffusion models to text generation have struggled in the past,” says Ermon. “What enabled Mercury to succeed where others failed are proprietary innovations in both training and inference algorithms.” Unlike images, which can be gradually cleaned up into recognizable forms, language follows rigid grammatical rules that make iterative refinement trickier.”

Hoover points to Inception Labs’ Mercury as a prime example of how diffusion models are closing the gap. “That model proved diffusion could hold its own and is actually faster and more efficient than comparable autoregressive models.”

The future of diffusion

The efficiency of diffusion-based LLMs could shake up AI deployment, particularly in enterprise applications where cost and speed matter. Traditional LLMs require substantial computing power, making them expensive to run. Diffusion models promise to deliver similar or better performance at a fraction of the cost. Diffusion models are often more efficient because they refine entire sequences in parallel rather than generating each word step by step like traditional LLMs, reducing computational overhead.

“Our customers and early adopters are developing applications powered by dLLMs in areas including customer support, sales and gaming,” Ermon says. “They’re making their applications more responsive, more intelligent and cheaper.”

Hoover sees an even broader impact. “Right now, AI is constrained by energy consumption,” he says. “Large models use enormous amounts of power. However, diffusion models operate differently, allowing for far greater efficiency. In the long run, we could see diffusion-based AI systems running on analog hardware, reducing energy costs dramatically.”

Analog computing, which processes information using continuous electrical signals rather than binary operations, has long been touted as a potential solution to AI’s energy problem. Hoover believes diffusion models are particularly well-suited to this approach.

“These models are inherently interpretable,” he says. “That means we can map their internal computations directly onto analog circuits, something that’s much harder to do with traditional deep learning architectures.”

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

Explore watsonx Orchestrate Explore watsonx.ai