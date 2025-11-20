Early testers noted that Google reported sizable gains for Gemini 3 on several high-difficulty evaluations, including Humanity’s Last Exam, GPQA Diamond and ARC-AGI-2, and highlighted improvements in how the model interprets text, images, audio and video together. They also pointed to new coding and agentic tools that can generate working applications with less prompting than earlier versions. Even with those advances, IBM Senior Research Scientist Marina Danilevsky said on a recent episode of the Mixture of Experts podcast that Gemini 3 “is still hallucinating, and it still really likes to give answers rather than say that it does not know the answers.”

Other researchers emphasized the importance of Google’s ecosystem strategy. IBM Chief Architect of AI Open Innovation Gabe Goodhart said on the podcast that “a really great model is not that differentiated anymore.” He argued that the competitive edge now lies in the surrounding tools rather than model size alone. He pointed to Antigravity as an example, calling it “something you cannot get anywhere else,” with the ability to launch “a fleet of delegate worker agents” that can run tasks in parallel.

Hands-on testing made the contrast clearer. Merve Unuvar, Director of Agentic Middleware and Applications Research in AI at IBM, said on the podcast that she asked Gemini 3 to build a personal workout dashboard. The model spun up a working Streamlit interface in under two minutes and delivered a clean set of recommendations. But when she asked for more tailored guidance, it produced advice that ignored information it already had, telling her to “eat high-nutrition food after the workout to ‘grow,’” despite knowing her age.

Goodhart said the real test for Gemini 3 will come from how well it handles complex, multi-agent workflows, not just benchmarks.

“If the model can actually hold up to that level of independence and parallel analysis,” he said, “it could be a real breakthrough.”