The most sophisticated language models can explain quantum mechanics, draft legal briefs, and compose poetry in the style of Keats. What they cannot do is catch a ball, feel the weight of a coffee mug, or understand why stubbing your toe hurts. This is not a temporary limitation awaiting the next training run. It represents a fundamental gap between intelligence as we have built it and intelligence as we experience it.

Human cognition is not a disembodied process that happens to reside in a skull. It is shaped, moment by moment, by the fact that we are physical creatures navigating a physical world. When you read the word "heavy," your brain activates regions associated with lifting. When you think about the future, you literally lean forward, however imperceptibly. Meaning, for humans, is grounded in the body.

The symbol grounding problem returns

Philosophers have debated this territory for decades. John Searle's Chinese Room argument, proposed in 1980, suggested that manipulating symbols according to rules does not constitute understanding. The response from AI researchers was often dismissive — surely enough symbols, enough rules, enough data would bridge the gap. Large language models have tested that hypothesis at unprecedented scale. They have ingested more text than any human could read in a thousand lifetimes. Yet they remain, in a precise sense, ungrounded.

When a language model uses the word "cold," it has learned statistical associations: cold appears near winter, ice, shiver, uncomfortable. What it lacks is the phenomenological anchor — the memory of breath visible in frigid air, the ache of fingers losing feeling, the relief of stepping indoors. This is not sentimentality. Research in cognitive science suggests that such embodied associations are not decorative additions to meaning but constitutive of it.

Why robotics has not solved this

The obvious response is to give AI a body. Robotics companies have made remarkable progress: machines that walk, run, flip, and manipulate objects with increasing dexterity. But bolting a language model onto a robot does not automatically produce embodied understanding. The robot's sensors generate data; the model processes symbols. The integration remains shallow.

True sensorimotor grounding would require something more radical: learning from the ground up through physical interaction, the way an infant spends years building intuitive physics before uttering a first word. Current approaches largely skip this developmental process, treating physical capability as an engineering problem separate from cognitive architecture. The result is systems that can perform impressive physical feats while remaining, in their reasoning, as disembodied as ever.

What this means for AI's future

None of this implies that language models are useless — they are demonstrably powerful tools for tasks involving text. But it does suggest boundaries to what text-trained systems can achieve. Tasks requiring genuine physical intuition, real-time adaptation to novel environments, or understanding rooted in bodily experience may remain stubbornly difficult. The path to artificial general intelligence, if such a thing is possible, likely runs through the body, not around it.

Our take

The AI industry's focus on ever-larger language models has produced astonishing capabilities and equally astonishing blind spots. The embodiment problem is not a bug to be patched in version 5.0; it is a structural feature of how these systems were built. Acknowledging this is not pessimism — it is clarity. The most useful AI will come from understanding what these tools actually are, rather than projecting onto them capacities they do not possess. Intelligence without a body is intelligence of a very particular, and very limited, kind.