Every conversation with an AI model takes place inside a box. The box has walls, and when you hit them, the model starts forgetting.

This is the context window — the maximum amount of text a large language model can process in a single exchange. It is measured in tokens (roughly three-quarters of a word each), and it represents perhaps the most consequential constraint in modern AI that most users never learn about. Understanding it changes how you think about what these systems can and cannot do.

The architecture of temporary memory

When you chat with a model, it does not remember your previous sessions. It does not learn from your corrections. Each conversation begins with a blank slate, and everything the model knows about your current exchange must fit inside that context window — your messages, its responses, any documents you've uploaded, and the hidden system instructions that shape its behavior.

Early transformer models had context windows of a few thousand tokens. Contemporary models have expanded this dramatically, with some claiming windows of hundreds of thousands of tokens. But bigger is not simply better. Processing longer contexts requires exponentially more computation, and models tend to pay uneven attention across long inputs — a phenomenon researchers call the "lost in the middle" problem, where information buried in the center of a long document gets less weight than content at the beginning or end.

Why this matters for real use

The context window explains several puzzling AI behaviors. Ask a model to summarize a book and it may produce confident-sounding nonsense — not because it is lying, but because the book exceeded its window and it is working from fragments. Request consistency across a long document and watch the model contradict itself as earlier passages scroll out of its effective memory. Try to build a complex application through extended dialogue and notice the model gradually losing track of decisions you made together an hour ago.

Professional users have developed workarounds: chunking long documents, strategic summarization, external memory systems that retrieve relevant passages on demand. But these are patches on a fundamental limitation. The context window is not a bug to be fixed; it is a core architectural feature of how transformers process information.

The gap between demo and deployment

Marketing materials showcase models analyzing entire codebases or legal contracts in one pass. The reality is more constrained. Effective context length — the portion of the window where the model maintains strong comprehension — is typically shorter than the advertised maximum. And cost scales with usage: processing a hundred-thousand-token context is not merely slower but substantially more expensive than a short exchange.

This creates a persistent gap between what AI appears capable of in controlled demonstrations and what it reliably delivers in production. The gap is not dishonesty; it is the difference between optimal conditions and the messy reality of varied inputs and edge cases.

Our take

The context window is the AI equivalent of working memory in humans — a hard limit on how much can be held in mind simultaneously. Knowing this single fact protects you from both excessive skepticism and naive optimism. These models are genuinely powerful within their constraints and genuinely limited beyond them. The companies building them are racing to expand those walls, but the walls remain. Anyone telling you otherwise is selling something.