The gap between what large language models can do and what their promoters suggest they might do has become one of the defining tensions of the decade. On one side, demonstrations that would have seemed like science fiction a few years ago: systems that pass bar exams, debug code, and generate photorealistic video from text prompts. On the other, a quieter reality that practitioners know well but rarely headlines: these same systems cannot reliably count the letters in a word, struggle to plan multi-step tasks, and possess no persistent memory of conversations held five minutes prior unless explicitly engineered to fake it.
This is not a criticism so much as a clarification. Understanding what AI cannot do is essential to understanding what it is.
The reasoning mirage
The most consequential limitation is also the most counterintuitive. Large language models appear to reason — they produce outputs that look like logical chains — but they are fundamentally pattern-completion engines trained on statistical regularities in text. When a model solves a novel logic puzzle, it is often because structurally similar puzzles appeared in its training data. When it fails spectacularly on a trivially modified version of the same puzzle, the illusion breaks.
Researchers have documented this extensively. Change the numbers in a well-known math problem, or introduce an irrelevant detail, and performance collapses in ways no human reasoner would exhibit. The models are not thinking through problems; they are recognizing them. The distinction matters enormously for any application requiring genuine inference under novel conditions — medicine, law, engineering, anywhere the stakes are high and the edge cases are endless.
The memory wall
Current systems have no durable memory. Each conversation begins from zero unless developers implement elaborate retrieval systems to simulate continuity. This is not a minor inconvenience; it is a fundamental architectural constraint. A model cannot learn from its mistakes in deployment, cannot accumulate expertise over time, cannot develop the kind of contextual judgment that comes from experience. Every interaction is, from the model's perspective, its first.
The workarounds — retrieval-augmented generation, fine-tuning, prompt engineering — are impressive feats of engineering that paper over the problem without solving it. They create the appearance of memory and learning while the underlying system remains frozen at the moment its training ended.
The embodiment problem
Perhaps most fundamentally, these systems have no body, no sensory experience, no causal interaction with the physical world. They know about apples only through descriptions of apples, never through biting one. This matters more than it might seem. Human intelligence is grounded in embodied experience — our concepts of weight, texture, distance, and time emerge from physical interaction. AI systems operate in a purely linguistic space, manipulating symbols that refer to a world they have never touched.
This is why robotics remains so difficult despite advances in language models. Generating plausible text about making a sandwich is trivially easy; actually making a sandwich in an unfamiliar kitchen remains an unsolved problem.
Our take
None of this diminishes what has been achieved. The commercial applications are real, the productivity gains measurable, the creative possibilities genuinely novel. But the breathless trajectory toward artificial general intelligence that dominates conference stages and investor pitches requires a suspension of disbelief about these limitations. Today's AI is a remarkable tool — emphasis on tool. Confusing it for a nascent mind is not optimism; it is a category error with consequences.




