Rather than immediately generating a direct response to a user’s input, reasoning models are trained to first generate intermediate “reasoning steps” before arriving at the final answer provided to the user. Some reasoning LLMs show users their reasoning traces, while others only summarize or altogether hide these intermediate outputs.
Simply put, reasoning LLMs are trained to spend more time “thinking” before they respond. The addition of this “reasoning process” has been empirically shown to yield major advancements in LLM performance on complex reasoning tasks. This success has expanded the real-world use cases and domains to which AI models can be applied, marking an important inflection point in the ongoing development of generative AI and AI agents.
It’s worth noting, however, that anthropomorphic terms like a model’s “thought process” are more convenient than literal. Like all machine learning models, reasoning models are ultimately just applying sophisticated algorithms to make predictions—like what word should come next—that reflect patterns learned from training data. Reasoning LLMs have not demonstrated consciousness or other signs of artificial general intelligence (AGI). AI research published by Apple in June 2025 casts doubt on whether current model reasoning abilities can scale to truly “generalizable” reasoning.1
It’s perhaps most accurate to say that reasoning LLMs are trained to “show their work” by generating a sequence of tokens (words) that resembles a human thought process—and that this act of “verbalizing" thoughts seems to unlock latent reasoning capabilities that LLMs implicitly learn from their massive corpus of training data (which contains examples of individuals directly and indirectly articulating their own processes).
The concept of a “reasoning model” was introduced by OpenAI’s o1-preview (and o1-mini) in September 2024,2 followed by Alibaba’s “Qwen with Questions” (QwQ-32B-preview) in November and Google’s Gemini 2.0 Flash Experiment in December. A milestone in the development of reasoning LLMs was the January 2025 release of the open source DeepSeek-R1 model. Whereas the training processes used to fine-tune prior reasoning models had been closely guarded secrets, DeepSeek released a detailed technical paper that provided a blueprint for other model developers. IBM Granite, Anthropic and Mistral AI, among others, have since released their own reasoning LLMs.