When Google's Gemini image generator recently produced a promotional graphic spelling the company's own name as "Gooogle," the internet responded with predictable mockery. But the joke obscures something more troubling: after years of exponential progress and hundreds of billions in investment, the most sophisticated AI systems on Earth still cannot reliably render the word "cat."

The spelling problem is not a bug awaiting a patch. It is a structural feature of how diffusion models understand—or rather, fail to understand—language. These systems do not "see" text as humans do. They process images as statistical patterns of pixels, learning that certain visual arrangements correlate with certain prompts. The letter "G" is not a symbol with meaning; it is a cluster of curves that sometimes appears near other clusters.

Why more training data hasn't helped

The intuitive assumption is that feeding models more examples of correctly spelled text would solve the problem. It hasn't. Google, OpenAI, and Midjourney have all thrown massive datasets at the issue with marginal improvements. The reason is architectural: diffusion models generate images holistically, not sequentially. They do not "write" text letter by letter; they hallucinate an entire image and hope the text-shaped regions resolve into something legible.

This is why you get "Starbvcks" and "Coca-Cota"—the model knows roughly what the logo should look like but lacks the discrete symbolic reasoning to ensure each character is correct. It is pattern-matching without comprehension, and no amount of scale has bridged that gap.

The commercial problem nobody discusses

For consumer entertainment, misspelled AI art is a minor annoyance. For commercial applications, it is a dealbreaker. Marketing teams cannot use AI-generated imagery if every product name requires manual correction. Advertising agencies cannot automate asset creation if brand consistency is impossible. The entire promise of AI-generated commercial content—faster, cheaper, infinitely scalable—collapses when a human must still verify every word.

This explains why Adobe, Canva, and other design-tool companies have quietly pivoted toward hybrid approaches: AI generates the image, but text is overlaid using traditional rendering. It works, but it concedes that generative AI has hit a wall.

Our take

The spelling problem is a useful reminder that AI progress is not linear. These models are extraordinary at tasks humans find difficult—generating coherent faces, simulating lighting, imagining novel compositions—and terrible at tasks humans find trivial. That asymmetry is not temporary. It reflects a fundamental mismatch between how neural networks process information and how symbolic reasoning works. Until someone solves that problem, Google's AI will keep misspelling Google. The memes write themselves, even if the models cannot.