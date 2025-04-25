Benchmarks, inference costs, innovation: how’s AI reshaping our society? This year, Stanford’s 2025 AI Index Report added new areas of coverage to reflect AI’s growing role in every facet of our lives.
IBM Think unpacked some key trends in the report with Vanessa Parli, Director of Research Programs at Stanford’s Institute for Human-Centered Artificial Intelligence, and Ash Minhas, a Technical Content Manager at IBM.
A hot topic, if there was ever one, benchmarks have become a central debate now that AI capabilities are advancing so quickly that they’re consistently outpacing the tools used to measure them.
“Every year, we look at how these algorithms are performing across benchmarks, and every year it seems like they're beating those benchmarks,” says Vanessa Parli, one of the report authors, in an interview with IBM Think. “Similarly, this year, that is happening even with the newer benchmarks.”
The report noted that in 2023, researchers introduced new benchmarks—MMMU, GPQA and SWE-bench—to test the limits of advanced AI systems. Just a year later, performance sharply increased: scores rose by 18.8, 48.9 and 67.3 percentage points on MMMU, GPQA and SWE-bench, respectively, according to the report.
This raises ambiguity within the research community on the true meaning—and value—of an LLM benchmark. Parli poses critical questions for consideration: “Are we measuring the right thing? Are those benchmarks compromised? And how should the scientific community evaluate models?”
Thinking ahead, Ash Minhas also questions what the future of benchmarking will look like. “Where is that going to stop?” he asks in an interview with IBM Think. “Is the Turing Test going to have to constantly be a moving goal post? Is humanity's last exam really the last exam?”
Meanwhile, experts caution against the risk of overfitting, a phenomenon in which an AI model has learned to perform exceptionally well on specific benchmark tests but may fail to generalize to new, unseen data in real-world applications. “Are we just training the model to pass the benchmark?” he adds. “MMMU is a good benchmark, but is it because the model knows how to respond to the benchmark?”
Minhas also warns that the excitement and momentum of progress could be taking priority over caring about ethics, fairness and bias.
With last year’s Nobel Prizes in Physics and Chemistry awarded to researchers working on artificial neural networks and protein design and prediction, it is hard to overlook the significance of AI’s growing role in the medical field. The report notes that the number of FDA-approved, AI-enabled medical devices has grown exponentially: in 2023, 223 were approved, compared to just six in 2015.
“This area of AI enhancing scientific discovery can have a lot of impact on our society,” says Parli.
According to Minhas, this growth shows the rapid pace of innovation, but also raises questions: “Do we have the right experts and the right skills to be able to test these new devices and products?”
AI has been a key force behind major investments in 2024. The number of newly funded generative AI startups nearly tripled, and after years of slow adoption, business adoption accelerated significantly in 2024, the report found.
AI has moved from the margins to become a central driver of business value. Total corporate investment in AI hit USD 252.3 billion in 2024, with private investment jumping 44.5% and mergers and acquisitions rising 12.1% compared to the previous year. This lends itself to a flourishing startup ecosystem in the US, where private AI investment hit USD 109.1 billion in 2024.
At work, AI is also a major player, and many anticipate agentic AI’s impact on enterprise workflows
However, businesses move at a different pace than innovation. “The technology is leaping forward, but the people and processes take time to change,” Minhas says.
AI’s impact on ROI is still debatable, he suggests. “There isn’t a good understanding of the economic benefit yet,” Minhas says. “No one agrees on what the ROI is, and no one really knows.”
The report highlighted that countries around the world are also ramping up their investments in infrastructure, and, of course, the release of powerful models from China shows that US advances shouldn’t be taken for granted.
“I don't think that we can take for granted that the US is always going to be at the top of these charts, and we need to continue to think about these components of AI: compute, talent, data,” Parli says. “We should continue to invest if we want to maintain the innovation leadership that we have had in the past, and make sure that we have the right ingredients to make that happen.”
Still, the report highlights another interesting, if not contradictory, trend: countries with the highest investment in AI, such as the US, are expressing more skepticism in AI products and services than countries with a more limited AI tech budget.
According to figures shared in the report, 80% of people surveyed in Indonesia believe AI products are more beneficial than harmful, compared to only 39% in the US.
“In many countries, AI allows access to certain resources, [like] healthcare, for example, and I think we’ll probably be more optimistic about AI and cultural differences," explains Parli. "There are also cultural differences around questions like privacy, security and data privacy.”
Finally, AI will be more present in the physical space.
The report found that from 2013 to 2023, the number of industrial robots installed globally roughly tripled, with 541,000 installed in 2023.
“With some of the AI tools, where you can talk to the robot in natural language, you can use motion,” says Parli. "You can work much more closely with the robots, [and] it will be easier to collaborate with them. I see healthcare as an area where robotics will go further.”
