When you return to a conversation with a chatbot after a few hours and it seems to have forgotten everything you discussed, you are not experiencing a bug. You are encountering one of the most consequential design constraints in modern artificial intelligence: the context window.

A context window is the amount of text a large language model can process at once — the entirety of what it can "see" when generating a response. Everything outside that window simply does not exist to the model. There is no hard drive where your previous conversations live, no database of facts about you accumulating over time. Each interaction begins, in a meaningful sense, from nothing.

The architecture of amnesia

The transformer architecture that powers models like GPT and Claude processes text through a mechanism called attention, which allows every word in the input to relate to every other word. This is computationally expensive — the cost scales quadratically with the length of the input. A context window of 100,000 tokens is not merely twice as demanding as one of 50,000; it requires roughly four times the computation.

This creates hard engineering tradeoffs. Larger windows mean slower responses and higher costs. The companies building these systems have expanded context windows dramatically over the past few years, from a few thousand tokens to hundreds of thousands. But even the largest windows are finite, and the model's ability to attend to information degrades toward the edges.

Researchers have documented a phenomenon called "lost in the middle," where models reliably recall information placed at the beginning or end of their context window but struggle with material buried in between. The window is not a perfect memory even within its bounds.

The illusion of continuity

What makes this constraint particularly confusing for users is how successfully it is hidden. When you chat with an AI assistant, the interface often simulates continuity by quietly feeding previous messages back into each new prompt. The model is not remembering your earlier exchange; it is being shown a transcript of it, consuming context window space in the process.

This is why conversations with AI assistants sometimes degrade in quality over extended sessions. As the transcript grows, older material gets truncated or summarized to fit within limits. The assistant that seemed so sharp an hour ago is now working with a compressed, lossy version of your interaction.

Some systems attempt workarounds: storing summaries of past conversations, retrieving relevant snippets from databases, or maintaining explicit user profiles. These approaches help, but they are fundamentally different from how human memory works. They are search and retrieval, not retention.

Why this matters for how you use AI

Understanding context windows changes how you should interact with these systems. Providing relevant information at the start of a conversation matters more than assuming the model "knows" you. Restating key constraints when asking follow-up questions is not redundant; it is compensating for architectural reality.

It also explains why AI assistants excel at certain tasks and struggle with others. Analyzing a single document that fits within the window? Excellent. Maintaining coherent understanding across a months-long project with thousands of pages of accumulated context? Fundamentally limited by design.

The companies building these systems are racing to expand windows and improve retrieval mechanisms. But the core constraint — that attention is finite and expensive — is unlikely to disappear. It is baked into the mathematics.

Our take

The context window is not a temporary limitation waiting to be engineered away; it is a defining characteristic of how current AI systems process information. Users who understand this will get better results and feel less frustrated. The chatbot that forgot your name is not being rude. It never knew your name in the first place — it only ever saw it written down, briefly, before the page was turned.