Your AI can write a sonnet but cannot count the letters in 'strawberry'. That limitation reveals everything.

Ask a large language model to write a haiku about autumn leaves, and it will produce something serviceable, perhaps even lovely. Ask it how many times the letter 'r' appears in the word 'strawberry', and there is a reasonable chance it will confidently answer 'two' when the correct count is three. This is not a bug awaiting a patch. It is a window into what these systems actually are.

The disconnect bewilders people because it inverts our intuitions about intelligence. Counting letters feels trivially easy — a task any child can perform with patience. Composing poetry feels hard, the province of education and creativity. Yet language models excel at the supposedly difficult task while stumbling on the supposedly simple one. Understanding why requires abandoning the assumption that these systems process language the way humans do.

Tokens are not letters

Large language models do not see text as sequences of individual characters. They see tokens — chunks of text that might be whole words, word fragments, or punctuation marks. The word 'strawberry' might arrive as a single token or be split into 'straw' and 'berry', depending on the model's tokenizer. The model never actually examines the individual letters unless it has been specifically trained or prompted to decompose the token into its constituent characters.

This tokenization is not a design flaw but an engineering necessity. Processing text character-by-character would be computationally prohibitive at scale. Tokens allow models to handle meaningful linguistic units efficiently. The trade-off is that character-level operations become indirect, requiring the model to reason about something it does not directly perceive. It is like asking someone to count the threads in a fabric they can only see from across the room.

Statistical intuition versus symbolic manipulation

The deeper issue is that language models are fundamentally prediction engines, not symbolic processors. They have learned, through exposure to vast quantities of text, the statistical patterns of human language — which words tend to follow other words, which phrases sound natural, which arguments typically accompany which conclusions. When they write poetry, they are drawing on these learned patterns to generate text that resembles the poetry in their training data.

Counting, by contrast, is a symbolic operation requiring exact manipulation of discrete elements. It demands the kind of step-by-step logical procedure that traditional computer programs execute trivially but that neural networks must approximate through learned patterns. The model might have encountered many instances of people discussing letter counts in text, but it has not internalized a reliable algorithm for performing the count itself.

What fluency actually means

This distinction matters because it clarifies what fluency in a language model actually represents. When a model produces grammatically correct, contextually appropriate text, it is not demonstrating understanding in the human sense. It is demonstrating that it has learned the surface patterns of language well enough to generate plausible continuations. The appearance of comprehension emerges from sophisticated pattern matching, not from the kind of grounded reasoning humans perform.

This does not make language models useless — far from it. Pattern matching at this scale produces genuinely valuable capabilities. But it does mean that the boundaries of those capabilities follow a logic that differs from human intelligence. Tasks requiring precise symbolic manipulation, reliable factual recall, or genuine causal reasoning remain areas of persistent weakness, regardless of how impressively the model handles open-ended generation.

Our take

The letter-counting failure is not an embarrassing glitch but an honest signal. It reminds us that we are dealing with a fundamentally new kind of tool — one whose strengths and weaknesses do not map onto human cognition in predictable ways. The sooner we internalise that these systems are brilliant pattern-completion engines rather than nascent general intelligences, the better we will deploy them: for drafting and brainstorming where their fluency shines, not for tasks demanding the kind of precise, verifiable reasoning that remains, for now, distinctly human territory.