Ask a large language model to write a sonnet and it will produce something passable, perhaps even moving. Ask it to count the syllables in that sonnet and it will confidently give you the wrong number. This is not a bug awaiting a patch. It is a window into the architecture of the most consequential technology of our time.

The systems we casually call artificial intelligence do not think in the way that word implies. They predict. Specifically, they predict the next token—a chunk of text that might be a word, part of a word, or a punctuation mark—based on statistical patterns learned from vast quantities of human writing. When you ask ChatGPT or Claude to add 47 and 89, it is not performing arithmetic. It is pattern-matching against millions of similar-looking math problems it encountered during training, then generating tokens that look like a plausible answer.

The tokenization problem

This is why language models struggle with tasks that seem trivially easy to humans. Counting letters in a word requires the model to understand how that word was broken into tokens during processing—information it does not reliably have access to. The word "strawberry" might be tokenized as "straw" and "berry," or as "str," "aw," and "berry," depending on the system. When asked how many R's appear in the word, the model must reconstruct something it never directly perceived.

The same principle explains why these systems hallucinate with such conviction. A language model has no mechanism for distinguishing between a fact it learned from reliable sources and a plausible-sounding pattern it absorbed from fiction, speculation, or error. It generates the next most likely token regardless of truth value. The confidence is not arrogance; it is the only mode the system has.

What the limits illuminate

Understanding this architecture clarifies both the genuine capabilities and the persistent limitations of current AI. These systems excel at tasks that benefit from vast pattern recognition across human knowledge: synthesizing information, translating between languages, generating code that follows established conventions, explaining complex topics in accessible terms. They struggle with anything requiring precise tracking of discrete elements, genuine logical reasoning across many steps, or distinguishing reality from plausibility.

The implications extend beyond party tricks about counting letters. When a language model assists with legal research, it may confidently cite cases that do not exist—not because it is lying, but because the pattern of a legal citation followed by supporting text is deeply embedded in its training. When it helps debug code, it may suggest fixes that look syntactically correct but miss the actual logic error. The failure mode is always the same: generating what should come next rather than verifying what is true.

Our take

The companies building these systems have strong incentives to paper over these limitations with guardrails, external tools, and careful prompting. Some of this works remarkably well—connecting a language model to a calculator solves the arithmetic problem entirely. But the underlying architecture remains unchanged, and users who mistake fluent output for reliable reasoning will continue to be burned. The most useful mental model is not artificial intelligence but artificial fluency: a system that has read everything and understood nothing, yet somehow remains extraordinarily useful precisely because of what it absorbed along the way.