Ask a large language model to write a sonnet about quantum mechanics and it will oblige with something passable, perhaps even elegant. Ask it whether you can fit a basketball inside a coffee mug and it may hesitate, hedge, or confidently declare that yes, with the right angle, it should work. This is the paradox at the heart of contemporary artificial intelligence: systems that seem superhuman at complex linguistic tasks routinely stumble on problems that require nothing more than the intuitive physics and social reasoning that children acquire before kindergarten.

The gap is not a bug to be patched in the next release. It reflects a structural limitation in how these systems learn about the world.

The missing curriculum

Humans develop common sense through embodied experience. We learn that water flows downhill not from reading about gravity but from spilling cups, splashing in puddles, and watching rain collect in gutters. We understand that people generally dislike being insulted because we have felt embarrassment, observed anger, and navigated countless social micro-negotiations. Large language models, by contrast, learn exclusively from text—billions of documents scraped from the internet, each one a flattened representation of knowledge that was itself produced by embodied humans.

This textual diet is extraordinarily rich in certain nutrients and almost entirely lacking in others. Models absorb vast statistical regularities about how words follow other words, which allows them to generate fluent prose and even pass professional licensing exams. But they never push a heavy box, never feel the resistance of a stuck drawer, never experience the social sting of a poorly timed joke. The result is a kind of savant intelligence: dazzling in narrow corridors, baffled by open fields.

Where the cracks appear

Researchers have documented these failures with almost comic regularity. Models confidently assert that a pencil will float if dropped in water, that you can drive from London to Tokyo, that a person can hold their breath for an hour with practice. They struggle with spatial reasoning, temporal sequencing, and causal inference—the very scaffolding of everyday thought.

More troubling than outright errors is the confident tone. Because models are trained to produce plausible-sounding text, they lack the metacognitive alarm bells that tell humans when they are out of their depth. A human asked whether a giraffe can fit inside a standard elevator might pause, visualize, and express uncertainty. A language model may simply generate the most statistically likely continuation, which in many contexts is an affirmative sentence delivered with unearned authority.

Why it matters beyond trivia

These limitations have practical consequences as AI systems are deployed in high-stakes domains. A medical assistant that cannot reason about whether a patient's symptoms are physically plausible may hallucinate diagnoses. A legal research tool that cannot distinguish between a binding precedent and a hypothetical scenario may mislead attorneys. The fluency that makes these systems useful also makes their failures harder to detect—users assume that something so articulate must also be reliable.

The common-sense gap also complicates the path toward more general artificial intelligence. Some researchers argue that scaling—simply making models larger and training them on more data—will eventually close the gap. Others believe that text alone can never provide the grounding necessary for genuine understanding, and that future systems will need to interact with simulated or physical environments to acquire the intuitive knowledge that humans take for granted.

Our take

The AI industry's marketing has a habit of emphasizing what these systems can do while glossing over what they cannot. The result is a public discourse that oscillates between utopian hype and dystopian panic, neither of which captures the more mundane reality: we have built extraordinarily powerful pattern-matching engines that remain, in important respects, profoundly ignorant about the world they describe. Recognizing this is not a counsel of despair. It is a prerequisite for using these tools wisely—and for understanding just how much of human intelligence remains beyond the reach of silicon.