Ask a modern AI assistant to write a sonnet about quantum mechanics and it will produce something passable in seconds. Ask it how many times the letter 'r' appears in the word 'strawberry' and there is a reasonable chance it will get it wrong. This is not a bug to be patched; it is a window into the soul of the technology.

The disconnect baffles casual users who assume that a system capable of discussing Wittgenstein must surely handle kindergarten arithmetic. But the assumption rests on a category error. Large language models do not think in the way humans use that word. They predict, and prediction operates on entirely different rails than counting, reasoning, or understanding.

Tokens are not letters

The architecture begins with tokenization, the process of chopping text into digestible chunks before the model ever sees it. These chunks are not individual characters but subword units optimized for compression. The word 'strawberry' might arrive as two or three tokens, none of which cleanly maps to its constituent letters. When you ask the model to count 'r's, you are asking it to perform character-level analysis on data it never ingested at that resolution. It is like asking someone to recall the thread count of a shirt they only saw from across the room.

This design choice is deliberate. Character-level models exist but scale poorly; subword tokenization lets systems train on vastly more text with the same compute budget. The trade-off is that fine-grained symbolic manipulation becomes an afterthought, handled only to the extent that such tasks appeared in the training corpus and the model memorized approximate patterns.

Prediction versus procedure

More fundamentally, language models are next-token predictors. Given a sequence, they estimate the probability distribution over what comes next, then sample from it. This is staggeringly effective for generating coherent prose because human language is, at bottom, a highly structured probabilistic phenomenon. Syntax, idiom, and even argument flow leave statistical fingerprints that a sufficiently large model can learn to mimic.

Counting, by contrast, is procedural. It requires maintaining a running tally, iterating through discrete items, and returning a precise result. Nothing in the transformer architecture explicitly supports loops or counters. When a model appears to count, it is either retrieving a memorized answer or pattern-matching to similar examples it saw during training. For novel or slightly tricky inputs, the illusion shatters.

The hype-reality gap

This matters because public discourse oscillates between two equally misleading poles: that AI is about to become sentient, and that it is merely a parlor trick. Neither captures the truth. Language models are extraordinarily powerful pattern engines. They compress and remix human knowledge in ways that genuinely augment research, writing, and coding. But they lack the discrete symbolic machinery that underpins mathematics, formal logic, and reliable fact retrieval.

Recognizing this distinction is not pessimism; it is the precondition for using the technology wisely. A calculator and a poet serve different purposes. We would not ask a poet to balance a ledger, nor expect a calculator to craft a eulogy. Language models are closer to the poet—brilliant at texture, unreliable at precision.

Our take

The counting problem is humbling precisely because it is trivial. It reminds us that fluency is not intelligence, and that the most impressive-sounding output can rest on foundations that crumble under the gentlest scrutiny. That is not a reason to dismiss AI; it is a reason to engage with it honestly. The technology is transformative, but transformation requires knowing what you are transforming and what it cannot do. Every user who understands why 'strawberry' trips up a billion-parameter model is better equipped to harness—and to distrust—the next generation of these systems.