The artificial intelligence industry has perfected a peculiar rhetorical trick: announcing capabilities in the future tense while collecting revenue in the present. Large language models can generate fluent prose, summarize documents, and write serviceable code. They cannot reliably count the letters in a word, maintain consistent beliefs across a conversation, or tell you whether they are making something up. This is not a minor discrepancy. It is the central tension of the current AI moment, and understanding it matters more than any benchmark score.
The hallucination problem is architectural, not incidental
When a language model confidently cites a Supreme Court case that does not exist—as happened in a widely reported legal filing—it is not malfunctioning. It is doing precisely what it was designed to do: predicting statistically plausible next tokens based on patterns in training data. The model has no mechanism for distinguishing between a real citation it encountered during training and a plausible-sounding citation it synthesized from fragments. This is not a bug awaiting a patch. It is a fundamental consequence of how these systems represent knowledge, which is to say, they do not represent knowledge at all. They represent the statistical shadow of knowledge, which is a different thing entirely.
Retrieval-augmented generation and similar techniques can reduce hallucination rates, but they cannot eliminate them without eliminating the generative capability that makes these models useful in the first place. Every deployment in high-stakes domains—medicine, law, finance—requires human verification, which means the productivity gains are real but bounded by the verification bottleneck.
Reasoning remains shallow
Models can solve many problems that look like reasoning by pattern-matching against similar problems in their training data. They struggle conspicuously when problems require genuine multi-step inference, especially when the solution path was not well-represented in training. Ask a model to solve a logic puzzle with a novel structure, and performance degrades rapidly. Ask it to solve the same puzzle with familiar framing, and it often succeeds. This is interpolation masquerading as intelligence.
The practical implication is significant: current AI excels at tasks where the answer space is constrained and examples are abundant. It struggles where answers require synthesis across domains, where edge cases matter, and where the cost of confident wrongness is high. Automating the middle of a workflow is easier than automating the judgment calls at either end.
The agency gap
Despite breathless announcements about autonomous agents, today's AI systems cannot reliably execute multi-step plans in open-ended environments. They lose track of goals, fail to recover from errors, and cannot distinguish between tasks that require clarification and tasks they should proceed with. The gap between a chatbot that answers questions and an agent that accomplishes objectives remains vast—not because the engineering is incomplete, but because robust agency may require architectural innovations that have not yet occurred.
Our take
The honest case for AI is compelling enough without embellishment. These systems are genuinely useful for drafting, summarizing, brainstorming, and coding assistance. They are genuinely not useful for tasks requiring factual precision, novel reasoning, or autonomous judgment. The industry's insistence on conflating current capabilities with speculative futures does everyone a disservice—investors who misallocate capital, workers who fear obsolescence prematurely, and researchers whose real achievements get lost in the noise. The technology deserves neither worship nor panic. It deserves the same skeptical respect we extend to any powerful tool: appreciation for what it does, clarity about what it does not, and patience while the gap between them slowly narrows.




