Every time you ask an AI chatbot a question, you are not conversing with a mind. You are triggering an extraordinarily sophisticated prediction machine — one that has read more text than any human ever could and learned, through brute statistical force, which words tend to follow which other words. The result feels like intelligence. It is not. Understanding this distinction is essential to using these tools wisely and regulating them sensibly.

The core operation is almost comically simple to describe: given a sequence of words, predict the next one. Repeat until you have a paragraph. The complexity lies entirely in how that prediction is made — and the scale at which it happens.

Tokens, weights, and the transformer trick

Large language models do not read words the way humans do. They break text into "tokens" — fragments that might be whole words, syllables, or even individual characters depending on frequency. The word "understanding" might be one token; "defenestration" might be three. A model like GPT-4 processes these tokens through dozens of layers of mathematical transformations, each layer adjusting numerical "weights" that encode patterns learned during training.

The breakthrough architecture, introduced in a landmark 2017 paper titled "Attention Is All You Need," is called the transformer. Its key innovation is the "attention mechanism," which allows the model to weigh how much each token in a sequence should influence the prediction of the next. When generating a sentence about Paris, the model attends more heavily to tokens like "France" and "Eiffel" than to "the" or "a." This context-sensitivity is what makes modern outputs feel coherent across long passages.

Training: reading the internet, statistically

Before a model can predict anything, it must be trained on vast corpora — billions of documents scraped from books, websites, academic papers, and code repositories. During training, the model sees a passage with the final word hidden and guesses what comes next. If it guesses wrong, the error signal propagates backward through the network, nudging the weights slightly. Repeat this process trillions of times, and patterns emerge: grammar, facts, style, even the cadence of human reasoning.

Critically, the model stores no explicit database of facts. It cannot look up that the Eiffel Tower is 330 metres tall. It has merely learned that, in contexts involving the Eiffel Tower and height, the tokens "330" and "metres" frequently appear together. This is why LLMs "hallucinate" — they generate statistically plausible text that may be factually false, because plausibility and truth are not the same thing.

What this means for users and policymakers

Recognising that LLMs are prediction engines, not reasoning agents, clarifies both their power and their limits. They excel at drafting, summarising, translating, and brainstorming — tasks where fluency matters more than factual precision. They falter when asked to verify claims, perform novel logical deductions, or know when they do not know something.

For regulators, the implication is that anthropomorphising these systems leads to misguided policy. An LLM cannot "intend" harm any more than a calculator can. The risks are real — misinformation, bias encoded in training data, job displacement — but they stem from how humans deploy the technology, not from emergent machine volition.

Our take

The tendency to describe LLMs as "thinking" or "understanding" is understandable; their outputs invite it. But indulging that metaphor obscures the engineering reality and inflates both fears and expectations. These are statistical mirrors, reflecting the patterns of human language back at us with eerie fidelity. That is genuinely impressive. It is also, at bottom, arithmetic — performed at a scale that makes it feel like sorcery. Knowing the trick does not diminish the show; it simply helps you know when to applaud and when to check the receipts.