The most sophisticated language models ever built can draft legal contracts, explain quantum mechanics, and compose sonnets in the style of Shakespeare. Ask them to multiply 47 by 83, and they might confidently tell you 3,801. The actual answer is 3,901.

This is not a bug that engineers will eventually patch. It is a window into the nature of these systems — and into the nature of intelligence itself.

The tokenization problem

Language models do not see numbers the way humans do. They see tokens — chunks of text that their training process has determined are statistically useful units. The number "3,901" might be split into "3," "901" or "39" "01" or processed as a single unit, depending on the tokenizer. The model has no inherent concept of place value, no understanding that the "3" in "3,901" represents three thousand while the "9" represents nine hundred.

When you learned arithmetic in primary school, you learned algorithms: carry the one, move to the next column, track the decimal point. These are explicit procedures operating on symbols with defined positional meaning. Language models learned something entirely different. They learned that when humans write "47 × 83 =", certain sequences of digits tend to follow. They are pattern-matching, not calculating.

This works surprisingly well for simple operations that appeared frequently in training data. Two plus two reliably yields four. But as numbers grow larger and combinations become rarer in the training corpus, the model increasingly relies on interpolation — educated guessing based on superficially similar examples it has seen.

What this reveals about understanding

The arithmetic failure is philosophically interesting because it suggests these systems possess something like knowledge without comprehension. A language model can explain the distributive property of multiplication in perfect prose. It can walk you through the steps of long division. It can even catch errors in your work, sometimes. What it cannot reliably do is actually perform these operations.

This bifurcation — fluent explanation without reliable execution — challenges our intuitions about intelligence. We tend to assume that if someone can teach a concept clearly, they must understand it deeply. Language models demonstrate that articulation and understanding can be decoupled in ways we had not previously imagined.

The systems are, in a sense, very sophisticated parrots that have heard so many mathematicians talk that they can reproduce the cadence and vocabulary of mathematical reasoning without possessing the underlying computational machinery. They have learned the music of mathematics without learning to play the instrument.

The workarounds and their limits

Engineers have developed clever patches. Many deployed systems now route arithmetic queries to external calculators, returning precise answers while the language model handles the conversational wrapper. Some researchers have experimented with chain-of-thought prompting, forcing models to show their work step by step, which improves accuracy on moderately complex problems.

But these solutions underscore rather than resolve the underlying limitation. A system that needs a calculator to multiply is not a system that understands multiplication. The patches are prosthetics, not cures.

This matters beyond arithmetic. The same architectural features that prevent reliable calculation also limit performance on tasks requiring strict logical sequencing, precise temporal reasoning, or consistent tracking of multiple variables. These are exactly the capabilities that would be required for the most ambitious visions of artificial general intelligence.

Our take

The arithmetic problem is clarifying, not damning. Language models are extraordinarily useful precisely because they excel at what humans find difficult — synthesizing vast information, maintaining consistent tone across long documents, translating between domains and registers. That they fail at what humans find easy should not diminish their utility; it should calibrate our expectations. These are not nascent minds struggling toward consciousness. They are a genuinely new kind of tool, powerful in ways we are still discovering, limited in ways we are still mapping. The multiplication errors are not embarrassing failures. They are honest signals about what these systems actually are.