Ask a large language model to count the r's in 'strawberry' and watch it fail. This is not a bug to be patched but a window into the alien way these systems perceive text — a perception so different from human reading that the word 'reading' barely applies.
The failure is instructive. When you type 'strawberry,' the model does not see s-t-r-a-w-b-e-r-r-y. It sees something closer to 'straw' and 'berry' fused into abstract numerical tokens, or perhaps 'str' and 'awberry,' depending on the tokenizer's training. The individual letters have already vanished, dissolved into statistical patterns optimized for predicting the next chunk of text. Asking the model to count letters is like asking someone to count the brushstrokes in a photograph — the underlying granularity has been compressed away.
The tokenization bargain
Tokenization is the preprocessing step that makes modern language models computationally feasible. Rather than process text character by character — which would require impossibly long sequences for any meaningful document — models chunk language into tokens: sometimes whole words, sometimes word fragments, sometimes punctuation marks. GPT-4's tokenizer splits English text into roughly 0.75 tokens per word on average, meaning a 1,000-word essay becomes approximately 750 tokens.
This compression is elegant and brutal. It lets models maintain context across thousands of words, enabling the coherent long-form writing that makes them useful. But it severs the connection between the model's internal representation and the orthographic reality of spelling. The model has learned that 'strawberry' relates to red fruit, summer, and desserts — but it has no reliable access to the fact that the string contains three r's.
What the model actually knows
This is not ignorance in the human sense. The model has processed billions of sentences containing 'strawberry' and has developed rich associations with the word. It knows strawberries are fragile, that they pair well with cream, that they grow on low plants rather than trees. In many ways, its knowledge of strawberries exceeds that of most humans.
But this knowledge is semantic, not orthographic. The model understands meaning without understanding spelling in the way a literate human does. It is something like a scholar who has read every book through a translator and can discourse brilliantly on their contents but cannot recognize a single word in the original language.
Recent models have improved at letter-counting through various workarounds — chain-of-thought prompting, explicit spelling-out steps, or fine-tuning on character-level tasks. But these are patches on a fundamental architecture, not solutions to it. The underlying representation remains token-based.
Why this matters beyond party tricks
The strawberry problem is often presented as a amusing limitation, a gotcha for overconfident AI enthusiasts. But it illuminates something deeper about the gap between statistical language modeling and human cognition.
Humans learn language through embodied experience — pointing at objects, hearing sounds, feeling the shape of letters with crayons. Our orthographic knowledge is grounded in sensorimotor experience. Language models learn language through disembodied pattern-matching across text corpora. Their 'knowledge' is purely relational: words defined by their proximity to other words, meaning emerging from statistical co-occurrence.
This works astonishingly well for many tasks. But it means these systems have a fundamentally different relationship to language than humans do. They are not reading in any sense we would recognize. They are performing extraordinarily sophisticated pattern completion on tokenized sequences.
Our take
The letter-counting failure is neither a trivial bug nor a fatal flaw — it is a design signature. Language models were built to predict text, not to see it, and they accomplish that goal with remarkable success. Understanding this distinction matters as these systems become infrastructure for writing, research, and decision-making. They are powerful tools with alien limitations, and the strawberry test is a useful reminder that fluent output does not imply human-like comprehension. The machine writes beautifully about strawberries while remaining genuinely blind to how the word is spelled.




