The most sophisticated language models in existence struggle to count the letter 'r' in the word 'strawberry.' This is not a bug awaiting a patch. It is a window into what these systems fundamentally are and are not.

When users discover this limitation, the reaction is often bewilderment. A machine that can summarize legal briefs, write functional code, and discuss Kantian ethics cannot perform a task a five-year-old masters without effort. The disconnect feels absurd until you understand how these models actually process language.

The tokenization problem

Large language models do not read text character by character. They consume it in chunks called tokens, which are typically fragments of words determined by statistical frequency in training data. The word 'strawberry' might be split into 'straw' and 'berry,' or 'str,' 'aw,' and 'berry,' depending on the tokenizer. The model never sees individual letters as discrete units in the way humans do when counting.

This is not an oversight. Tokenization is what makes these systems computationally tractable. Processing text character by character would be prohibitively expensive and would actually harm performance on the tasks these models excel at: predicting coherent, contextually appropriate sequences of language. The architecture optimizes for fluency, not enumeration.

The result is a system that has absorbed statistical patterns across trillions of words but has no reliable mechanism for the kind of sequential, symbol-by-symbol processing that counting requires. It can approximate. It can sometimes get lucky. But it cannot count with certainty because counting was never what it was designed to do.

What fluency masks

The counting limitation is a useful proxy for a broader class of tasks where language models struggle: anything requiring precise, verifiable, step-by-step reasoning over discrete elements. Arithmetic, logical deduction with many variables, calendar calculations, and certain spatial reasoning problems all share this characteristic.

The fluency of these systems obscures these gaps. A model that produces grammatically perfect, contextually sensible prose creates an overwhelming impression of intelligence. Users naturally assume that a system articulate enough to explain quantum entanglement can handle basic math. The assumption is wrong, but the mistake is understandable.

This is why experienced users develop intuitions about what to trust. Summaries, translations, brainstorming, stylistic transformations, and explanations of well-documented concepts tend to be reliable. Precise calculations, date arithmetic, and claims about verifiable facts require external verification.

The workarounds and their limits

Model developers have introduced various patches. Chain-of-thought prompting encourages models to reason step by step. Tool use allows models to call external calculators or code interpreters. Some newer architectures incorporate explicit reasoning modules. These approaches help, sometimes dramatically.

But they do not change the fundamental architecture. A language model with access to a calculator is still a language model that cannot count, augmented by a tool that can. The distinction matters because it clarifies where the intelligence resides. The model excels at knowing when to use the calculator and how to interpret its output. The actual computation happens elsewhere.

Our take

The inability to count letters is not an embarrassing flaw to be hidden. It is a clarifying feature. Understanding why language models fail at counting teaches you more about what they actually are than any marketing material ever will. These are pattern-completion engines of unprecedented sophistication, not general intelligences. The sooner users internalize this, the more effectively they can deploy these tools—and the less likely they are to be disappointed when the machine that writes poetry cannot tell them how many r's are in 'strawberry.'