ai4 min read·Updated Jun 6, 2026·Fact-check: reviewed

Google AI Overviews Face Scrutiny Over Basic Spelling and Letter

Despite advanced reasoning capabilities, Google's flagship search AI struggles with simple word mechanics due to its underlying transformer architecture.

BylineAlex Rivera·May 28, 2026·Updated June 6, 2026

AI reporter

Reports on model launches, frontier labs, developer platforms, and AI policy with an emphasis on claims verification and rollout context.

Editorial responsibility: Lead reviewer for AI coverage, launch claims, and policy context

AI modelsDeveloper toolsAI policyLabs and safety

Recent work

Source context

Primary source: TechCrunch AI. Full source links and update notes are below.

Reporting policy Corrections policy

Fast summary

Start here

Google's AI Overview feature has recently produced errors claiming 'Google' has two 'p's and 'poop' has one 'r'.
Google acknowledged that counting letters within words is a known challenge for Large Language Models (LLMs) and stated it is working on a fix.
Experts explain that these errors occur because AI models process text as 'tokens' or numerical encodings rather than reading individual letters.

A graphic illustration representing Google's AI interface with spelling errors.

What happened

Google's AI Overviews are drawing scrutiny for failing at tasks that look trivial to humans, such as counting letters in a word or spelling a simple name correctly. The errors have become a useful public example of a broader truth about large language models: their impressive fluency does not mean they process language the way people do. An LLM can summarize a document, answer a nuanced question, or explain a complex idea, yet still stumble on whether "Google" contains a certain letter or how many times a character appears in a word.

That contradiction feels absurd to users, but it is not random. It reflects a core architectural mismatch between human intuitions about text and the way transformer-based systems represent language internally.

What's new in this update

Google has acknowledged that letter counting and word-level character tasks remain a known weakness for large language models and says it is working on improvements. That acknowledgment matters because AI Overviews are no longer niche experiments. They are central to Google's effort to rebuild search around generative answers. When a feature this prominent makes elementary-looking mistakes, the issue is no longer academic. It becomes a product-trust problem for a system positioned as the first response layer for billions of searches.

The latest examples, including bizarre counting and spelling failures, illustrate how these weaknesses can surface even when the model appears otherwise coherent. Users do not experience them as subtle architectural limitations. They experience them as obvious stupidity in a tool that is supposed to feel advanced.

Key details

The core explanation lies in tokenization and representation. Large language models do not "read" words character by character in the way a child learning spelling would. They break text into tokens, which can be words, fragments, or other statistical units, and then reason over those encoded pieces. That method is incredibly effective for many language tasks, but it is poorly aligned with exact letter-by-letter operations.

Several implications follow from that design:

A model can appear semantically strong while being weak at exact character manipulation.
Spelling, counting, and letter-position tasks are not automatically easy just because language generation is strong.
Product users often misjudge model competence because fluent output hides structural blind spots.
Fixing these failures may require targeted architectural or tool-based patches rather than simple scaling alone.

This is why the "strawberry test" and similar examples have persisted in AI culture. They reveal that some of the hardest problems for models are not always the ones humans consider difficult.

Background and context

Google's AI search push has already been criticized over hallucinations, odd citations, and confidently wrong advice. Spelling and counting errors deepen that criticism because they make the limitations feel almost mocking: how can a model reason about complex topics yet fail to count letters in a single word? The answer is that LLM competence is uneven and often brittle in places humans do not expect.

Researchers have long known that transformer systems optimize for next-token prediction, not symbolic precision. That means tasks requiring exact string manipulation, deterministic counting, or strict stepwise symbolic control may remain weak unless supported by separate mechanisms, external tools, or specialized training.

What to watch next

The next thing to watch is whether Google and other model builders address these problems through lightweight patches, hybrid systems, or more substantial architectural changes. Search products may increasingly need verification layers for exact operations rather than trusting generative models to handle them natively.

It will also be worth watching whether user tolerance declines as AI features move from novelty to infrastructure. Consumers may forgive occasional weirdness in a chatbot, but they will be less forgiving if the same pattern keeps appearing in a search engine positioned as authoritative.

Why this matters

This matters because Google, Search, LLMs, tokenization, AI Overviews, and the broader public perception of generative AI are all caught up in a simple but revealing question: can these systems be trusted with basic tasks? The spelling and counting failures show that linguistic fluency is not the same as symbolic reliability. As AI becomes more embedded in search and everyday interfaces, those differences will matter more, not less.

Reader context

This story belongs to Northstar Herald's Generative AI and Machine Learning coverage, with related entities including Google, Search, LLM, Tokenization. The report is based on TechCrunch AI source material.

Related coverage

Why it matters

The inability of advanced AI to perform basic spelling tasks reveals a fundamental disconnect between how machines process data and how humans understand language.

Get the week's key developments in one concise email.

Get a fast catch-up on the biggest stories, the context behind them, and the links worth your time.

Cadence

Weekly, for a quick catch-up

Coverage

AI, business, world, security, sports

Format

Clear takeaways and useful context

Request the briefing

Leave your email to open a prepared request and get on the list for the weekly briefing.

One concise email.·Weekly cadence.·Prefer RSS instead?

About the byline

Alex Rivera

AI reporter

Alex Rivera reports on artificial intelligence with an emphasis on model launches, frontier lab strategy, developer tooling, and the policy decisions shaping commercial deployment.

Sources and methodology

https://techcrunch.com/2026/05/27/why-googles-ai-cant-spell-google-or-anything-else/

GoogleSearchLLMTokenizationTechCrunch

Stay with this topic

Google Shifts Focus to Agentic AI with Gemini 3.5 Flash Launch

The new model emphasizes autonomous execution over conversation, enabling developers to build complex systems and automate multi-week workflows via the

May 20, 2026·4 min read

From Filmmaking to Physics: Runway Sets Sights on World Models

The $5.3 billion startup is betting that observational video data, rather than text, is the key to the next frontier of artificial intelligence.

May 15, 2026·4 min read

Then read