Ask GPT-4 or Claude to write a haiku about autumn, and you will likely receive something lovely. Ask it to verify that the haiku has the correct 5-7-5 syllable structure, and you will often receive confident nonsense. The model might count "beautiful" as two syllables or "fire" as two, then declare victory. This is not a bug to be patched. It is a window into the alien nature of machine intelligence.
The failure is so consistent, so reproducible, that it has become a parlour trick among AI researchers. But beneath the amusement lies a genuinely important insight: large language models do not process language the way humans do. They do not hear words. They do not sound them out. They have never experienced the physical act of speaking, the way a syllable requires a pulse of breath, a movement of the jaw. They learned language as pure pattern, divorced from embodiment.
The tokenisation problem
The root cause is architectural. Before a language model sees any text, that text is broken into tokens — chunks that might be whole words, fragments, or individual characters, depending on frequency in the training data. The word "understanding" might become two tokens; "AI" might be one. The model never sees letters as discrete units, let alone syllables. It predicts the next token based on statistical patterns among billions of tokens, not phonetic rules.
This is why counting tasks that seem trivial to humans — syllables, letters, words in a sentence — often trip up even frontier models. The model has no native representation of these units. When forced to count, it must infer from context, essentially guessing based on what similar-looking counting exercises produced in its training data. Sometimes it guesses right. Often it does not.
What this tells us about intelligence
The syllable problem is a microcosm of a larger truth: language models are not general reasoners that happen to use language. They are language-pattern engines that sometimes approximate reasoning. They excel at tasks where the answer is encoded in patterns — summarisation, translation, style transfer, code generation. They struggle when the task requires stepping outside the text to engage with the world the text describes.
Humans count syllables by subvocalising, by feeling the rhythm. We bring our bodies to the task. A model brings only statistics. This is not a criticism; it is a clarification. The technology is genuinely transformative for pattern-dense work. But expecting it to think like a human is a category error, and the syllable test is a cheap, reliable way to remember that.
Our take
The inability to count syllables is not evidence that AI is stupid. It is evidence that AI is strange — that its intelligence, while real and useful, operates on fundamentally different principles than ours. The companies racing to deploy these systems would do well to remember this. So would the rest of us. The haiku test is humbling, but it is also clarifying: we have built something powerful, alien, and not yet fully understood.




