New Gemini model boosts Google’s standing in high-stakes AI tests

Published 20 November 2025

3D rendering of a neural network architecture, with several thin squares formed by aligned and interconnected colored dots

Staff Writer

IBM

This article was featured in the Think newsletter. Get it in your inbox.

Google’s Gemini 3 launched this week with impressive gains on some of the field’s hardest reasoning evaluations, a shift IBM researchers say reflects a real advance in Google’s frontier-model capabilities.

Gemini 3 introduces a set of feature upgrades that Google describes as a step up in practical capability. According to the company’s announcement, the model now handles text, images, audio and video in a single context window; adds new agentic-coding tools that let developers generate working applications from prompts and expands its reach across Google Search, the Gemini app and enterprise platforms such as Vertex AI.

Benchmark boost

Google also boasts benchmark jumps that it says reflect improvements in reasoning and tool use. The company highlighted gains on ARC-AGI, stronger performance in terminal-based code execution and better results on developer-oriented tasks that require planning steps and running tools.

Google is positioning Gemini 3 as the centerpiece of a broader ecosystem built around agentic tooling and cross-application coordination. Central to that effort is Antigravity, an integrated development environment designed to let the model plan tasks, call tools, operate across terminals and browsers, and distribute work among multiple agents.

Industry newsletter

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

First impressions show both promise and caveats

Early testers noted that Google reported sizable gains for Gemini 3 on several high-difficulty evaluations, including Humanity’s Last Exam, GPQA Diamond and ARC-AGI-2, and highlighted improvements in how the model interprets text, images, audio and video together. They also pointed to new coding and agentic tools that can generate working applications with less prompting than earlier versions. Even with those advances, IBM Senior Research Scientist Marina Danilevsky said on a recent episode of the Mixture of Experts podcast that Gemini 3 “is still hallucinating, and it still really likes to give answers rather than say that it does not know the answers.”

Other researchers emphasized the importance of Google’s ecosystem strategy. IBM Chief Architect of AI Open Innovation Gabe Goodhart said on the podcast that “a really great model is not that differentiated anymore.” He argued that the competitive edge now lies in the surrounding tools rather than model size alone. He pointed to Antigravity as an example, calling it “something you cannot get anywhere else,” with the ability to launch “a fleet of delegate worker agents” that can run tasks in parallel.

Hands-on testing made the contrast clearer. Merve Unuvar, Director of Agentic Middleware and Applications Research in AI at IBM, said on the podcast that she asked Gemini 3 to build a personal workout dashboard. The model spun up a working Streamlit interface in under two minutes and delivered a clean set of recommendations. But when she asked for more tailored guidance, it produced advice that ignored information it already had, telling her to “eat high-nutrition food after the workout to ‘grow,’” despite knowing her age.

Goodhart said the real test for Gemini 3 will come from how well it handles complex, multi-agent workflows, not just benchmarks.

“If the model can actually hold up to that level of independence and parallel analysis,” he said, “it could be a real breakthrough.”

Unpacking the agentic AI journey: what delivers, what distracts, and what deserves your investment

Join us to explore where agentic AI is already delivering measurable value, where the technology is still evolving, and how to prioritize investments that align with your organization’s strategic goals.

Resources

Unpacking the agentic AI journey: what delivers, what distracts, and what deserves your investment

Join us to explore where agentic AI is already delivering measurable value, where the technology is still evolving, and how to prioritize investments that align with your organization’s strategic goals.

IBM named a Strong Performer in the "Forrester Wave: AI Foundation Models for Language, Q2 2024"

Businesses recognize that they cannot scale generative AI with foundation models that they cannot trust. Download the excerpt to learn why IBM, with flagship "Granite models," is named a Strong Performer.

The CEO's guide to model optimization

Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.

IBM is named a Leader in Data Science and Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

watsonx Developer Hub

Support your next project with some of our most commonly used capabilities. Get started and learn more about the supported models that IBM provides.

A differentiated approach to AI foundation models

Explore the value of enterprise-grade foundation models that provide trust, performance and cost-effective benefits to all industries.

Unlock the power of generative AI and ML

Learn how to incorporate generative AI, machine learning and foundation models into your business operations for improved performance.

How IBM is tailoring generative AI for enterprises

Learn how IBM is developing generative foundation models that are trustworthy, energy efficient and portable.

Related solutions

IBM Granite

Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.

Explore Granite

Artificial intelligence solutions

Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions

AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services

Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

Discover watsonx.ai

Explore IBM Granite AI models