The most capable artificial intelligence systems ever built struggle with a task any five-year-old can perform: counting. Ask a state-of-the-art language model how many times the letter 'r' appears in 'strawberry' and it will confidently answer two. The correct answer is three. This is not a glitch to be patched in the next release. It is a window into what these systems fundamentally are — and are not.

The strawberry problem, as it has become known in AI circles, reveals something profound about the gap between human cognition and machine intelligence. We assume that any system smart enough to draft legal briefs, debug code, and explain quantum mechanics must surely be able to count letters. This assumption betrays our deep misunderstanding of how large language models actually process information.

The tokenization trap

Language models do not see text the way humans do. Before any word reaches the neural network, it passes through a tokenizer — a preprocessing system that chops text into chunks the model can digest. These chunks, called tokens, rarely align with our intuitive sense of letters or even words. The word 'strawberry' might become three tokens: 'straw', 'ber', and 'ry'. The model never encounters the individual letters at all.

This design choice was not arbitrary. Tokenization dramatically reduces the computational cost of processing text. A model that processed character by character would need sequences roughly five times longer to handle the same text, exponentially increasing memory and compute requirements. The engineers who built these systems made a reasonable trade-off: sacrifice character-level precision for the ability to process vast amounts of language efficiently.

The consequence is that language models are, in a meaningful sense, illiterate. They manipulate tokens the way a chess engine manipulates board positions — with extraordinary sophistication but without the underlying perception humans take for granted. When asked to count letters, the model must essentially guess based on statistical patterns in its training data about how letter-counting questions are typically answered.

What this reveals about intelligence

The strawberry problem illuminates a broader truth about artificial intelligence: competence in one domain does not imply competence in another, even when humans would consider the second task simpler. A language model's ability to discuss Wittgenstein does not mean it can reliably perform arithmetic. Its capacity to generate syntactically perfect French does not mean it understands that 'chat' has four letters.

This phenomenon — superhuman performance in some areas coexisting with subhuman performance in others — distinguishes current AI systems from human cognition. Human intelligence is general in ways we rarely appreciate until we encounter systems that are spectacularly narrow. A child learning to read simultaneously learns to count letters, recognize patterns, and understand that words are made of smaller units. These capabilities emerge together from a unified cognitive architecture.

Language models have no such unity. They are statistical engines optimized for one objective: predicting the next token. Everything they appear to know — history, science, logic, emotion — emerges as a byproduct of this singular goal. When that goal happens to align with a task, the results can seem magical. When it does not, the failures can seem inexplicable.

Our take

The strawberry problem is not an embarrassing limitation to be minimized in marketing materials. It is a crucial piece of information for anyone trying to understand what AI systems actually are. These tools are not nascent general intelligences temporarily stumbling over trivia. They are a genuinely new kind of cognitive technology with a genuinely alien architecture. The sooner we stop projecting human-like understanding onto them, the sooner we can use them wisely — and the sooner we can stop being surprised when they confidently tell us that strawberry contains two r's.