Approaching the midpoint of 2025, we can look back at the prevailing artificial intelligence trends of the year so far—and look ahead to what the rest of the year might bring.
Given the breadth and depth of AI development, no roundup of AI trends can hope to be exhaustive. This piece is no exception. We’ve narrowed things down to a list of 10: 5 developments that have driven the first half of the year, and 5 more that we expect to play a major role in the months to come.
Trends in AI are driven not only by advancements in AI models and algorithms themselves, but by the ever-expanding array of use cases to which generative AI (gen AI) capabilities are being applied. As models grow more capable, versatile and efficient, so too do the AI applications, AI tools and other AI-powered workflows they enable. A true understanding of how today’s AI ecosystem is evolving therefore requires a contextual understanding of the causes and effects of machine learning breakthroughs.
This article primarily explores ongoing trends whose real-world impact might be realized on a horizon of months: in other words, trends with tangible impact mostly on or in the year 2025. There are, of course, other AI initiatives that are more evergreen and familiar. For example, though there has been recent movement on fully self-driving vehicles in isolated pockets—robotaxi pilots have been launched in a handful of U.S. cities, with additional trials abroad in Oslo, Geneva and 16 Chinese cities—they’re likely still years away from ubiquity.
Many other important macro trends in AI—such as the advent of AI agents, or AI's disruption of search behaviors and SEO—are broad, multi-faceted and already well covered elsewhere, and so were left out in favor of more focused developments that haven't received such widespread media attention.
That said, on with the list.
Progress doesn't necessarily require a constant influx of brand new ideas. Many of the most important AI trends in the first half of 2025 reflect changes in how the industry is applying existing ideas—some pragmatic and productive, others less so.
Today’s models are not only significantly better than the models of yesteryear, but also vastly cheaper to run. Consider this chart from SemiAnalysis: in under 2 years, the per-token pricing to achieve equivalent results on the MMLU benchmark decreased dozens of times over. This is hardly news to anyone who has been monitoring the performance metrics of each successive generation of model releases. But viewed in the aggregate, this constantly accelerating pace of improvement illustrates the case for generative AI hype better than viewing the already-impressive capabilities of present-day models.
One study estimates the pace of algorithmic improvement at roughly 400% per year: in other words, today’s results can be achieved a year later using one fourth of the compute—and that’s without accounting for simultaneous improvements in computing (see: Moore’s Law) or synthetic training data. The original GPT-4, rumored to have around 1.8 trillion parameters,1 achieved a score of 67% on HumanEval, a popular benchmark for coding performance. IBM Granite 3.3 2B Instruct, released 2 years later and 900 times smaller, achieved a score of 80.5%.2
This exponential expansion of model economy, more than anything, is what empowers the emerging era of AI agents. Large language models (LLMs) are becoming more practical even faster than they’re becoming more capable, which enables the deployment of complex multi-agent systems in which a cadre of models can plan, execute and coordinate on complex tasks autonomously—without skyrocketing inference costs.
The release of OpenAI’s o1 introduced a new avenue for increasing model performance. Its head-turning improvement over prior state-of-the-art performance on highly technical math and coding benchmarks initiated an arms race in so-called “reasoning models.” Their enhanced performance on tasks requiring logical decision-making figures to play an important role in the development of agentic AI. But as is often the case with AI technology, the initial frenzy over raw performance has more recently given way to a search for the most practical implementation.
The intuition behind reasoning models stems from research demonstrating that scaling up test-time compute (used to generate an output) could enhance model performance as much as scaling up train-time compute (used to train a model). This insight manifested in techniques to fine-tune models in ways that incentivize the generation of longer, more complex “thought processes” prior to a final output—a school of techniques broadly called inference scaling.
But inference scaling also means increased inference costs and latency. Users must pay (and wait) for all the tokens the model generates while “thinking” about the final responses, and those thinking tokens eat into the available context window. There are use cases that justify that extra time and compute, but there are also many scenarios in which it’s a waste of resources. However, constantly switching from a reasoning model to a “standard” model on a task-by-task, prompt-by-prompt basis is impractical.
For now, the solution is “hybrid reasoning models.” In February, IBM Granite 3.2 became the first LLM to offer a toggleable “thinking” mode, allowing users to leverage reasoning when they need it and prioritize efficiency when they don’t.3 Anthropic’s Claude 3.7 Sonnet followed suit later that month, adding the ability for API users to have fine-grained control over how long the model “thinks” for.4 Google introduced a similar “thinking” modularity function for Gemini 2.5 Flash.5 Alibaba’s Qwen3, like IBM Granite, allows thinking to be turned on or off.
Ongoing research aims to further increase our understanding of what’s actually happening while reasoning models “think,” and the extent to which extended chain-of-thought (CoT) reasoning traces actually contribute to results. A paper released in April suggests that for some tasks, reasoning models can be effective without outputting thoughts. Meanwhile, Anthropic research from earlier that month asserted that the CoT results shown to the user might not actually reflect what the model is really “thinking.”
AI development has always heavily relied on leveraging open source knowledge repositories, such as Wikipedia and GitHub. Their importance will only increase moving forward, especially after high-profile revelations that major AI developers have been training models on pirated book torrents—which will presumably discourage continued use of those alternate sources. For the organizations running invaluable open source resources, the situation is already causing serious strain.
While a bevy of lawsuits have brought awareness to the harms of data harvesting—whether legal, illegal or ambiguous—on intellectual property, less attention has been paid to how AI systems’ hunger for data harms knowledge repositories. As the Wikimedia Foundation articulated in an April announcement on bot traffic, “[their] content is free, [their] infrastructure is not.” Wikimedia in particular has experienced a potentially unsustainable onslaught of web traffic from scraping bots collecting data to train generative AI models. Since January 2024, bandwidth used for downloading Wikimedia’s multimedia content has grown by 50%.
The increase in traffic volume itself is troubling unto itself, but it’s the nature of that traffic that’s putting disproportionate pressure on finite resources. Human browsing behavior is predictable: our traffic clusters on popular pages and follows logical patterns, allowing for automation and caching strategies that efficiently allocate bandwidth. But unlike humans, bots indiscriminately crawl obscure pages, which often forces datacenters to serve them directly. This is not only costly and inefficient under ordinary circumstances and potentially disastrous in situations when infrastructure needs to respond to actual real-world usage spikes.
As Ars Technica reports, this problem is widespread and exacerbated by what many view as deliberately predatory behavior on the part of bot crawlers and the companies operating them. Several, such as Perplexity, have been accused of surreptitiously circumventing robots.txt and even bypassing paywalls to scrape data. When websites try to rate-limit bot access, the bots are switched to different IPs; when their ID is blocked directly, they switch to alternate ID strings. One open source infrastructure manager, who found that almost 25% of his network’s traffic came from ChatGPT bots, described it as “literally a DDoS on the entire internet.”
In response, many projects are actively pursuing defensive measures. One open source project, Anubis, forces bots to solve computation puzzles before gaining access. Another, Nepenthes, sends AI crawlers down an “infinite maze.” Cloudflare, a prominent web infrastructure provider, recently launched a feature they call “AI Labyrinth,” which uses a similar (albeit less aggressive) approach. Wikimedia is marshaling a new initiative, WE5: Responsible Use of Infrastructure, aimed at a structural solution.
The ability of commercial AI development and open knowledge repositories to collaboratively develop a mutually suitable protocol will have tremendous impact on not only the future of AI, but on the future of the Internet itself.
Though the concept behind mixture of experts (MoE) models dates back to 1991, it didn’t enter mainstream natural language processing (NLP) or generative AI until Mistral AI’s release of its Mixtral model in late 2023.6 Though the model and its architecture received a great deal of attention—and OpenAI’s GPT-4 was rumored (albeit never confirmed) to be an MoE upon release—it largely didn’t motivate the industry to stray from its focus on conventional “dense” LLMs.
That focus seems to have changed in the aftermath of DeepSeek-R1. DeepSeek-R1, and the DeepSeek-V3 base model it was fine-tuned from, demonstrated conclusively that MoE models were perfectly capable of delivering cutting edge performance to complement their already-proven computational efficiency.
That reinvigorated interest in sparse MoE models is evident in the current wave of next-generation models—including (but not limited to) Meta Llama 4, Alibaba’s Qwen3 and IBM Granite 4.0—using the architecture. It’s also possible that some leading closed models from the likes of OpenAI, Anthropic or Google are MoEs, though such information about closed model architecture is rarely disclosed.
As impressive capacity and performance become increasingly commodified in the coming years, the inference speed and efficiency offered by sparse models are likely to become a higher priority.
The future is always hard to predict. The breakneck pace of improvement in prior generations of AI models had many expecting the generation of models to be released in 2025 to make meaningful steps toward artificial general intelligence (AGI). While the latest models from OpenAI, Meta and the other most-funded players in the AI space are no doubt impressive, they’re certainly short of revolutionary.
On the practical implementation side, progress has been uneven. Many business leaders who were bullish on their organization’s AI adoption outlook at the end of 2023 spent 2024 realizing that their organization’s IT infrastructure wasn’t ready to scale AI yet.
A common refrain amongst AI analysts is that AI will take over mundane, repetitive tasks and free up time for humans to focus on big picture, creative thinking. But thus far, data on AI adoption doesn’t necessarily reflect that in reality. A study conducted by the IBM Institute for Business Value (IBV) found that the opposite was true, at least in the retail industry’s content supply chain: 88% of retailers reported use of gen AI for “creative ideation/concepting” and 74% reported using it for “content creation and editing.” Meanwhile, most mundane work is still human territory: only 23% of retailers are using gen AI for generating content variations by channel and only 10% are using it for generating content variations by geo.
All in all, it’s not that organizations aren’t actively pursuing AI adoption—a new IBV report shows that they definitely are, particularly with regard to AI agents—but rather that it’s not happening at a straightforward, linear pace. The transition from experimentation to formal operationalization is rarely smooth.
By the back half of 2025 (and on through the beginning of next year), pieces will be in place for meaningful disruption of some aspects of the status quo in place since the earliest days of the ongoing generative AI era.
On a fundamental level, there’s no perfect benchmark (or set of benchmarks) for AI performance. Any benchmark is subject to Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” Nevertheless, it benefits model development—and the business leaders tasked with choosing specific AI solutions and models—to have standardized, transparently-administered measures of performance to facilitate apples-to-apples comparisons.
The first “standard” set of benchmarks that the industry coalesced around were those used by the Open LLM Leaderboard on Hugging Face. When its benchmarks became saturated—in other words, when most models were achieving such similarly high evaluation scores that it was hard to differentiate them—the leaderboard adopted new, significantly more challenging evaluations in June 2024. Once again, both open source and closed models coalesced around evaluating performance using the “V2” leaderboard’s evaluation benchmarks. But in March 2025, Hugging Face retired the Open LLM Leaderboard altogether.
The retirement of the leaderboard and movement away from the standard set of benchmarks it championed has caused, and was caused by, a diversification of how we use models and evaluate their performance.
There has been some momentum behind using more qualitative model comparison methods, such as the popular Chatbot Arena, over quantitative evaluations. But those, too, are imperfect. A recent paper published by an array of respected academic and open source researchers alleged several problematic Chatbot Arena practices that heavily favor the largest model providers. That paper followed allegations that Meta gamed Chatbot Arena during the release of Llama 4.
The reality is that there is no best benchmark. The best practice is probably for organizations to develop their own benchmarks that best reflect performance on the tasks they care about. A business wouldn’t hire an employee based solely on an IQ test and it shouldn’t choose a model based only on standardized tests either.
First introduced in 2017, transformer models are largely responsible for the era of generative AI, and they continue to be the backbone of everything from image generation to time series models to LLMs. Though transformers are certainly not going anywhere anytime soon, they’re about to have some company.
Transformers have a crucial weakness: their computational needs scale quadratically with context. In other words, each time your context length doubles, self-attention doesn’t just use double the resources—it uses quadruple the resources. This “quadratic bottleneck” inherently limits the speed and efficiency of conventional LLMs, especially on longer sequences or when incorporating information from earlier in an extended exchange. Continued optimization of the transformer architecture continues to yield stronger frontier models, but they’re getting extremely expensive.
Mamba, first introduced in 2023, is a different type of model architecture altogether—a state space model, specifically—and it’s poised to give transformers their first serious competition in the world of LLMs. The architecture has proven capable of matching transformers on most language modeling tasks (with the exception of in-context learning tasks like few-shot prompting) and its computational needs scale linearly with context length. Put simply, the way Mamba understands context is inherently more efficient: transformer’s self-attention mechanism must look at every single token and repeatedly decide which to pay attention to; Mamba’s selectivity mechanism only retains the tokens it determines to be important.
When it comes to transformers or mamba, the future of AI is not probably an “either/or” situation: in fact, research suggests that a hybrid of the two is better than either on their own. Several mamba or hybrid mamba/transformer models have been released in the past year. Most have been academic research-only models, with notable exceptions including Mistral AI’s Codestral Mamba and AI2I’s hybrid Jamba series. More recently, the upcoming IBM Granite 4.0 series will be using a hybrid of transformer and Mamba-2 architectures.
Most importantly, the reduced hardware requirements of Mamba and hybrid models will significantly reduce hardware costs, which in turn will help continue to democratize AI access.
The advent of multimodal AI models marked the expansion of LLMs beyond text, but the next frontier of AI development aims to bring those multimodal abilities into the physical world.
This emerging field largely falls under the heading of “Embodied AI.” Venture capital firms are increasingly pouring funding into startups pursuing advanced, generative AI-driven humanoid robotics, such as Skild AI, Physical Intelligence, and 1X Technologies.
Another stream of research is focusing on “world models” that aim to model real-world interactions directly and holistically, rather than indirectly and discretely through the mediums of language, image and video data. World Labs, a startup headed by Stanford’s Fei-Fei Li—famed for, among other things, the ImageNet dataset that helped pave a path for modern computer vision—raised USD 230 million at the end of last year.
Some labs in this space are conducting experiments in “virtual worlds,” like video games: Google DeepMind’s Genie 2, for example, is “a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments. The video game industry might, naturally, be the first direct beneficiaries of world models’ economic potential.
Many (but not all) leading AI experts, including Yann LeCun, Meta’s chief AI scientist and one of the three “godfathers of deep learning,"7 believe that world models, not LLMs, are the true path to AGI. In public comments, LeCun often alludes to Moravec’s paradox, the counterintuitive notion that in AI, complex reasoning skills are straightforward but simple sensorimotor and perception tasks that a child can do easily are not.8
Along these lines, some interesting research endeavors are aiming to teach AI to understand concepts, rather than just words, by embodying that AI in a robot and teaching it the way we teach things to infants.
The long-term promise of AI agents is that they’ll use AI to carry out complex, context-specific tasks autonomously with little to no human intervention. To be able to personalize its decision-making to the specific, contextually intricate needs of a given workplace or situation—the way a competent employee or assistant would—an AI agent needs to learn on the job. In other words, it must retain a robust history of every AI-generated interaction and how it went.
But to gather and retain that permanent memory of every interaction may be at odds with core notions of digital privacy in AI, especially when working with closed models deployed on the cloud (as opposed to deploying open sourced models locally).
For instance, in April, OpenAI announced that ChatGPT will now automatically remember every conversation you have with it, in furtherance of OpenAI’s goal to develop “AI systems that get to know you over your life.” But notably, the feature was not made available in the EU, UK, Switzerland, Norway, Iceland or Liechtenstein—presumably because it runs afoul of their existing privacy laws and AI regulations.9
It remains to be seen if the concept of a model not only saving all of its personalized interactions with you, but also using them for the further training and optimization of the model, is fundamentally compatible with core GDPR concepts like the “right to be forgotten.”
Indeed, the future of AI, and especially of AI agents, will be increasingly personal—to an extent that might result in AI’s impact transcending technological or economic considerations and crossing over into psychological territory.
Late in 2024, Microsoft AI CEO Mustafa Suleyman penned a blog post declaring his company’s goal of “creating an AI companion for everyone.” In a recent podcast interview, Meta CEO Mark Zuckerberg proposed “AI friends” as a solution to the nation’s loneliness epidemic.10 An expanding array of startups are rolling out AI coworkers.
There’s an inherent danger to this, derived primarily from humanity’s historical predisposition to getting emotionally attached to even early, rudimentary chatbots. With millions of people interacting with personalized chatbots every day, the risks of emotional attachment to AI coworkers will be complex, consequential and hard to avoid.
As we proceed through a pivotal year in artificial intelligence, understanding and adapting to emerging trends is essential to maximizing potential, minimizing risk and responsibly scaling generative AI adoption.
NOTE: All links are outside IBM.com.
1. "GPT-4 architecture, datasets, costs and more leaked," The Decoder, 11 July 2023
2. IBM Granite 3.3 2B model card, Hugging Face, 16 April 2025
3. "Bringing reasoning to Granite," IBM, 7 February 2025
4. "Claude 3.7 Sonnet and Claude Code," Anthropic, 24 February 2025
5. "Gemini Thinking," Google
6. "Adaptive Mixtures of Local Experts," Neural Computation, 1 March 1991
7. "Turing Award 2018: Novel Prize of computing given to 'godfathers of AI'," The Verge, 27 March 2019
8. @YLeCun on X (formerly Twitter), via XCancel, 20 February 2024
9. "ChatGPT will now remember your old conversations," The Verge, 11 April 2025
10. "Meta CEO Mark Zuckerberg Envisions a Future Where Your Friends Are AI Chatbots—But Not Everyone Is Convinced," Entrepreneur, 8 May 2025