Ask ChatGPT how many r's appear in the word "strawberry" and there is a reasonable chance it will confidently tell you two. The correct answer is three. This failure — trivial, almost comic — is not a bug to be patched. It is a window into the fundamental architecture of the most powerful AI systems ever built, and understanding why it happens is more useful than any amount of hype about artificial general intelligence.
The error persists because large language models do not see words the way you do. They see tokens — chunks of text that might be whole words, fragments of words, or punctuation marks, depending on how their training data was sliced. The word "strawberry" might arrive as "straw" + "berry" or some other decomposition entirely. The model never receives the individual letters as discrete objects to count. It is, in effect, being asked to perform arithmetic on data it cannot directly access.
The prediction machine
What these systems actually do is predict the next token in a sequence, over and over, billions of times per response. When you ask a question, the model is not retrieving an answer from a database or executing a program. It is generating the most statistically plausible continuation of the text so far, based on patterns absorbed from its training corpus. This is why it can write sonnets and summarize legal documents and explain quantum mechanics in the voice of a pirate — all of these are patterns it has seen variations of before.
But counting letters is not a pattern-completion task. It requires iterating through a string and incrementing a counter, which is trivially easy for any programming language and genuinely difficult for a system that experiences text as a river of probability distributions. The model can often get counting questions right, but only because it has memorized similar examples or because the answer happens to be the most probable next token. It is not actually counting.
Why the illusion is so convincing
The unsettling part is how articulate the wrong answers are. The model does not say "I cannot count letters." It says "The word strawberry contains two r's" with the same confident tone it uses for everything else. This is because confidence calibration is not built into the architecture. The system has no internal flag that distinguishes "I am highly certain" from "I am guessing." Every output emerges from the same statistical process, whether the model is reciting the periodic table or hallucinating a Supreme Court case that never existed.
This explains why so many professionals have been burned by AI-generated content. Lawyers have submitted briefs citing fabricated precedents. Journalists have published AI-written articles containing invented quotes. The failure mode is not that the AI is stupid — it is that the AI is incapable of knowing what it does not know, because it has no mechanism for knowing anything at all in the way humans mean that word.
Our take
The letter-counting problem is not a gotcha or a reason to dismiss these tools. It is a diagnostic. Anyone using AI effectively needs to understand that they are working with a sophisticated pattern-completion engine, not a reasoning entity. The technology is genuinely transformative for tasks that involve synthesis, summarization, and creative recombination of existing knowledge. It is genuinely unreliable for tasks that require verification, arithmetic, or factual precision. The companies building these systems know this, which is why they are racing to bolt on calculators, search engines, and code interpreters. But the core architecture remains unchanged: a prediction machine that sounds like it understands everything and understands nothing at all.




