Drip · Agents & RAG · 10 min read

Agent Long-Term Memory

Your agent doesn’t forget because its context is full. It forgets because nobody wrote anything down. The gap between a chatbot and a colleague is a memory system — and in 2026 that system is finally its own design problem, separate from retrieval and separate from the prompt.

Brain Drip EditorsLong-term memoryLong-term memory. Information an agent retains and reuses across turns, sessions, and days — as opposed to the short-lived context window, which is cleared every conversation. · Context vs. persistence · July 2026

The bottom line. There are three places to put agent memory: in the context window (bounded, cleared every session), in a retrieval store (durable, but similarity search has no sense of time), and in an explicit memory module (structured facts you can update and invalidate). The first fails on the second session; the second fails when a fact changes; the third — increasingly a small temporal knowledge graph that agents learn to write to — is where 2026 research and production have converged. The lab below runs the honest test: tell the agent six things, change one of them, and see which strategy still answers correctly fifteen sessions later.

§ 00 · THE GOLDFISH PROBLEMA full context window is not a memory

Ask a demo agent to “remember that I’m allergic to peanuts,” and it will — for exactly as long as that sentence stays inside the context window. Close the tab, start a new session, and the agent is a stranger again. This is the single most common gap between something that demos well and something people actually rely on.

The confusion is understandable: a 400,000-token context window looks like a lot of memory. But a context window is working memory, not long-term memory — it is scoped to a single conversation and cleared when that conversation ends. Even within one long session it degrades: models use the middle of a long context far worse than the ends, so “just put everything in the prompt” quietly loses information long before it runs out of tokens.

The fix is to treat memory as its own subsystem: something that decides what to write down, where to keep it, and what to pull backwhen it’s relevant. That subsystem is what turns a stateless tool into something that accumulates a working relationship with you.

§ 01 · THREE PLACES TO PUT MEMORYContext, retrieval, and an explicit store

Every memory design is some mix of three substrates. They are not rivals so much as layers — but each has a distinct failure mode, and knowing which one you’re leaning on tells you which failure to expect.

Fig 1The three substrates. The context window is bounded and per-session; a retrieval store is durable but time-blind; an explicit memory module keeps structured, updatable facts.

The context window. Keep the conversation in the prompt. Zero infrastructure, perfect fidelity — until the window fills and the oldest turns fall off the front. Bounded and per-session by definition.
A retrieval store. Embed past turns and facts into a vector index; fetch the most similar ones each turn. Durable and cheap, and the natural extension of RAG to conversation history. Its blind spot is time: nearest-neighbour search will happily return a fact that used to be true.
An explicit memory module. Store facts as structured records — increasingly a small temporal knowledge graphTemporal knowledge graph. A knowledge graph whose edges carry validity intervals, so a new fact ('moved to Berlin') can invalidate an old one ('lives in Toronto') rather than sitting alongside it. — that the agent can query, update, and invalidate. More machinery, but the only substrate that handles a fact changing.

§ 02 · THE SESSION-15 TESTTell it six things. Change one. Wait.

Here is a test that separates the strategies cleanly. In session 1, a user tells the agent six things about herself. In session 8, one of them changes — she moves from Toronto to Berlin. In session 15, you ask two questions: what do you remember about me? and where do I live? The first measures raw recall; the second measures whether the memory can handle an update, which is where naive designs fall over.

Lab · the session-15 testSix facts, told once in session 1 — one of them changes in session 8. Move the sessions forward and switch strategy.

Sessions elapsed · 15 · (the move to Berlin happened in session 8)

Name

forgotten

Role

forgotten

City

forgotten

Project

forgotten

Preference

forgotten

Constraint

forgotten

Facts recalled

0 / 6

of what she told you

Context tokens / turn

0.4K

fixed cost

“Where do I live?”

— (forgotten)

the update test

No memory: the agent is a goldfish. Everything from session 1 is gone by session 2 — cheap, and useless for anything longitudinal. Figures are illustrative of the failure modes, not a benchmark.

Three things to feel in the lab. “No memory” fails immediately— it’s the demo default and it’s a goldfish. “Full transcript” fails slowly and expensively — perfect until it truncates, and the bill grows every session. “Retrieval” fails on the update: it keeps cost flat but serves the stale “Toronto” because similarity has no notion of which fact is current. Only the strategies that write structured facts and invalidate old ones pass both questions at a fixed cost.

§ 03 · WRITING IT DOWNSummaries, facts, and temporal graphs

The interesting engineering is on the write side — deciding what is worth keeping and in what shape. Production systems in 2026 tend to run three write paths at once, from cheapest to richest:

Rolling summaries.Periodically compress the conversation into a running summary that rides along in the prompt. Cheap and durable, but lossy — re-summarizing erodes the least-salient details first, as the lab’s “rolling summary” mode shows.
Extracted facts.Run a small model over each turn to pull out durable statements (“user is a backend engineer”) and store them as records. Retrieval over facts beats retrieval over raw turns because the unit is already distilled.
Temporal edges. Store facts as graph edges with validity intervals. When a new fact contradicts an old one, you invalidate rather than append — the move to Berlin closes the Toronto edge. This is the piece that passes the update test, and the direction recent work like RaMem’s contextual reinstatement and temporal-graph memory frameworks have pushed on.

The read side then becomes a small retrieval problem over a much cleaner corpus: instead of searching thousands of raw turns, you query a few hundred structured facts, filtered to the ones currently valid. Smaller, cleaner, and time-aware — which is exactly what the raw-transcript and naive-retrieval strategies lack.

§ 04 · LEARNING WHAT TO REMEMBERMemory as a policy, not a dump

Writing everything down is its own failure: a memory store that keeps every utterance is just a slower, more expensive transcript. The 2026 shift is to treat what-to-remember as a decision the agent learns, not a fixed rule. Frameworks like MemAgent train the memory operations themselves — when to write, what to keep, what to overwrite — with reinforcement learning against long-horizon tasks, rather than hand-coding a heuristic.

You don’t need RL to benefit from the framing. Even a hand-written policy improves sharply once you make the four operations explicit: write (is this durable enough to keep?), select (which stored facts are relevant now?), compress (can these three facts become one?), and invalidate (does this new fact retire an old one?). Those are the same operations agentic context engineering applies within a single loop — long-term memory is the same discipline, extended across sessions and grounded in a durable store.

CHECKA support agent remembers a customer's plan tier via nearest-neighbour retrieval over past chat turns. The customer upgraded from Free to Pro last week. This week they ask 'what plan am I on?' and the agent sometimes says Free. What's the root cause?

§ 05 · A REFERENCE MEMORY STACKHow the layers fit together

Rendering diagram…

A production memory stack: a memory manager sits between the agent and three stores, running write on the way in and select on the way out.

The shape that keeps recurring: a thin memory managerbetween the agent and the stores. On the way in, it decides what to write and where. On the way out, it selects the handful of currently-valid facts worth spending context on. The agent itself stays simple; the intelligence is in the manager’s four operations and the temporal store behind them. Build that layer once and every session after the first starts warm.

§ · FURTHER READINGReferences & deeper sources

Shichun Liu et al. (2026). Memory in the Age of AI Agents: A Survey (paper list) · GitHub · Agent-Memory-Paper-List
RaMem authors (2026). RaMem: Contextual Reinstatement for Long-term Agentic Memory · arXiv:2606.22844
MemAgent authors (2025). MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent · OpenReview
Look Back to Reason Forward authors (2025). Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents · arXiv:2509.23040
Nelson F. Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts · arXiv:2307.03172 (TACL 2024)
Brain Drip Editors (2026). Blueprint: Give Your Agent Long-Term Memory · Brain Drip Blueprints

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.