
A new Oumi study, reported by The New York Times, found Google’s AI Overviews inaccurate 9% of the time — translating to tens of millions of wrong answers per hour at Google’s scale. Over half of accurate responses also cited sources that don’t fully support their claims, while Google called the study “seriously flawed.”
Oumi analyzed 4,326 searches answered by Gemini 2 in October and Gemini 3 in February, finding that Gemini 2 achieved 85% accuracy while Gemini 3 improved to 91%. Individually, these are defensible numbers for a generative AI system.
The challenge is volume. At Google’s reported rate of 5 trillion+ searches per year, the math produces a troubling picture:
· ~14 million inaccurate AI responses generated every hour
· ~230,000 incorrect answers delivered every minute
· ~4,000 errors produced every second at peak usage
The scale argument reframes the entire accuracy debate: even a small error rate, when applied to a system used by billions of people, becomes a large-scale misinformation problem in absolute terms.
Beyond the raw accuracy figures, Oumi identified a separate and arguably more concerning issue: “grounding” — whether the sources cited in AI Overviews actually support the claims being made. The findings reveal that Gemini 3, despite being more accurate than its predecessor, is significantly worse at providing genuinely supportive citations.
Under Gemini 2, 37% of correct answers were ungrounded. That figure rose to 56% under Gemini 3 — meaning the majority of accurate responses still linked to sources that don’t fully back up the information provided. This creates a verification problem: users who click through to “confirm” an answer may find that the source says something different or incomplete.
The sourcing analysis across 5,380 cited references also raised platform concerns. Facebook ranked as the second-most-cited source overall, while Reddit placed fourth. Both are social media platforms where user-generated, unverified content is prevalent — appearing at the top of an AI-synthesized search result lends them unearned authority. Facebook was cited in 5% of accurate responses and 7% of inaccurate ones, suggesting a pattern worth monitoring.
Google did not accept the study’s conclusions without pushback. Spokesperson Ned Adriance questioned the fundamental design of the analysis: Oumi evaluated Google’s AI accuracy using its own AI model, which introduces a methodological circularity — if Oumi’s model can also make mistakes, its judgments about Google’s errors may themselves be unreliable.
“This study has serious holes,” Adriance said. “It doesn’t reflect what people are actually searching on Google.”
Google also released its own comparative data. The company stated that standalone Gemini 3 — operating without the additional context provided by AI Overviews — was inaccurate 28% of the time, suggesting that the AI Overviews system provides meaningful accuracy improvements over raw model output. The company maintains its standard disclaimer at the bottom of all AI Overviews: “AI can make mistakes, so double-check responses.”
Google AI Overviews are AI-generated summaries that appear at the top of Google Search results, synthesizing answers to user queries and citing supporting web sources. Powered by Google’s Gemini models, the feature was broadly introduced in 2024 and now appears across billions of searches globally. They are distinct from standard search results, as they generate text rather than simply listing links.
An AI Overview is considered “ungrounded” when the websites it cites do not actually verify or fully support the information presented in the summary. This is problematic because users who try to check a claim by clicking the cited source may find that the source contradicts, partially supports, or is entirely unrelated to the AI’s statement — undermining the system’s role as a reliable information tool and making independent verification harder.
Google itself acknowledges the limitation with its built-in disclaimer that AI can make mistakes. For low-stakes queries, AI Overviews may provide a useful starting point. For health, legal, financial, or factual decisions, users should independently verify information through authoritative, primary sources rather than relying solely on AI-synthesized summaries. Checking the cited sources directly — rather than accepting the AI’s characterization of them — is advisable.