Every large language model trained to date shares a peculiar condition: it has learned about coffee exclusively through descriptions of coffee. It knows that coffee is bitter, that it pairs well with pastries, that it can be over-extracted or under-roasted. It can distinguish Ethiopian Yirgacheffe from Sumatran Mandheling in prose. But it has never felt the warmth of a mug, never experienced the jolt of that first morning sip, never recoiled from a burnt tongue. The question of whether this matters — whether genuine understanding can emerge from text alone — sits at the heart of AI's most consequential limitation.

This is not merely a philosophical puzzle. It determines what AI systems can reliably do and, more importantly, what they cannot.

The library of Babel problem

Imagine a scholar who has read every book ever written about swimming but has never entered water. She can describe the biomechanics of the freestyle stroke, explain buoyancy in precise physical terms, and quote Olympic coaches on technique. Ask her to swim, and she drowns.

Language models occupy an extreme version of this position. They have processed trillions of words — more text than any human could read in a thousand lifetimes — yet their entire existence unfolds in a realm of symbols disconnected from the physical referents those symbols describe. When a model generates the sentence "the lemon tastes sour," it is performing a sophisticated pattern completion based on how words about lemons typically cluster in its training data. It has no sensory access to sourness itself.

Philosophers call this the symbol grounding problem, articulated by cognitive scientist Stevan Harnad in 1990. Harnad asked how symbols in a formal system could ever acquire meaning if they were defined only in terms of other symbols — dictionaries all the way down, never touching the world. Large language models are, in a sense, the most elaborate demonstration of this problem ever constructed.

What gets lost in translation

The practical consequences surface in unexpected places. Language models struggle with spatial reasoning tasks that a five-year-old handles effortlessly: rotating objects mentally, predicting how liquids pour, understanding that a ball rolled under a couch is still there even when unseen. They falter on questions about physical causation — what happens when you push a glass off a table, why ice melts faster in warm water, how a bicycle stays upright. These are not failures of intelligence in any conventional sense. They are failures of grounding.

More subtly, the absence of embodiment may explain certain persistent weaknesses in common-sense reasoning. Human cognition is saturated with metaphors drawn from bodily experience: we "grasp" ideas, "weigh" options, feel "warmth" toward friends. These are not decorative flourishes but cognitive scaffolding built from physical interaction with the world. A system that has never grasped anything may be processing language about grasping in a fundamentally different — and shallower — way.

The road not yet taken

Some researchers argue this limitation is temporary, solvable by feeding models enough video, audio, and eventually robotic sensor data. Others suspect the problem runs deeper — that certain kinds of understanding are constitutively embodied and cannot be bootstrapped from any quantity of disembodied data. The honest answer is that nobody knows. The field is running one of the largest experiments in cognitive science history, and the results are not yet in.

Our take

The enthusiasm for large language models is warranted; they represent a genuine breakthrough in machine capability. But the current discourse too often conflates fluent language production with comprehensive understanding. A system can be extraordinarily useful — as these models demonstrably are — while remaining profoundly limited in ways that matter. The coffee it describes so eloquently remains, for the model, an abstraction all the way down. Recognizing this gap is not pessimism about AI. It is the precondition for using these tools wisely and for knowing when to trust them.