Ask ChatGPT to write a sonnet about quantum mechanics and it will produce something passable, perhaps even elegant. Ask it how many times the letter 'e' appears in 'entrepreneur' and it will confidently give you the wrong answer. This isn't a quirk awaiting a patch. It's a feature of the architecture itself, and understanding why illuminates what these systems are and—more importantly—what they are not.

The failure is so consistent it has become a parlor trick among AI researchers: prompt any major language model to count letters, syllables, or simple arithmetic involving large numbers, and watch it stumble. The same system that can explain the Byzantine Empire's fall or generate working code cannot reliably tell you that 'strawberry' contains three r's. The gap between linguistic sophistication and numerical incompetence seems absurd until you understand what's happening underneath.

The tokenization trap

Large language models don't see text the way humans do. Before any processing begins, your input gets chopped into tokens—chunks that might be whole words, fragments, or individual characters depending on frequency patterns in the training data. The word 'entrepreneur' might become three or four tokens, none of which preserve the letter-by-letter structure a human would use to count. The model never 'sees' the individual letters; it sees abstract numerical representations of common text patterns.

This is like asking someone to count the bricks in a wall while showing them only a blurry photograph taken from across the street. The information isn't absent—it's encoded in a form that makes the task artificially difficult. The model must reverse-engineer character-level information from token-level representations, a process that introduces errors at every step.

Pattern matching is not reasoning

The deeper issue is that language models don't compute in any traditional sense. They predict. Given a sequence of tokens, they calculate probability distributions for what comes next, drawing on statistical patterns absorbed from billions of text examples. When you ask for 7,849 multiplied by 3,267, the model isn't performing multiplication. It's pattern-matching against similar-looking arithmetic problems it encountered during training and generating tokens that seem statistically appropriate.

For simple, common calculations, this works surprisingly well—the training data contains enough examples that the pattern is reliable. But edge cases, unusual number combinations, or multi-step operations expose the illusion. The model is essentially a very sophisticated autocomplete that has learned to mime mathematical reasoning without implementing it.

What this means for everything else

The arithmetic failures are diagnostic of a broader truth: language models are brilliant at interpolation and unreliable at extrapolation. They excel when a task resembles their training distribution and struggle when it doesn't. They can synthesize information, adopt personas, and generate plausible text on almost any subject because those tasks reward the statistical pattern-matching at which they excel.

But tasks requiring precise, rule-based operations—counting, formal logic, multi-step planning with hard constraints—reveal the absence of an underlying reasoning engine. The implications ripple outward: a model that can't count letters also can't be fully trusted to verify legal citations, check financial calculations, or catch logical contradictions in complex arguments.

Our take

None of this makes language models useless—quite the opposite. Understanding their actual capabilities allows for intelligent deployment. They are extraordinary tools for drafting, brainstorming, translation, and synthesis. They are poor tools for verification, precision arithmetic, and anything requiring guaranteed accuracy. The counting problem isn't an embarrassment to be hidden; it's a gift, the clearest possible demonstration that these systems, however impressive, are not minds. They are mirrors that reflect our language back at us, polished to an uncanny shine, but mirrors nonetheless.