Two memory layers, not one
The fact store answers precise questions ("what's the user's city?"). But some things are fuzzy — the shape of a long conversation, an ongoing decision, the tone the user prefers in practice. That's what a rolling summary is for. It's the cheapest memory layer from the drip's lab: durable and small, but lossy as it's re-summarized.
We run both. Facts for the sharp, updatable stuff; a summary for the ambient context.
Summarize a session
SUMMARIZE_PROMPT = """Summarize this session in 2-3 sentences.
Preserve any durable facts, decisions, and the user's stated preferences.
Be concise; drop small talk.
Session transcript:
{turns}"""
def summarize_session(con, session: int, turns: list[str]) -> str:
transcript = "\n".join(turns)
r = ollama.chat(
model=CHAT_MODEL,
messages=[{"role": "user",
"content": SUMMARIZE_PROMPT.format(turns=transcript)}],
)
summary = r["message"]["content"].strip()
con.execute(
"INSERT OR REPLACE INTO summaries(session, text) VALUES(?, ?)",
(session, summary),
)
con.commit()
return summary
def recent_summaries(con, limit: int = 3) -> str:
rows = con.execute(
"SELECT text FROM summaries ORDER BY session DESC LIMIT ?", (limit,)
).fetchall()
return "\n".join(t for (t,) in reversed(rows))INSERT OR REPLACE keys on session, so re-summarizing a session overwrites its summary rather than duplicating it.
The erosion is real — and fine
Re-summarizing loses detail; that's the drip's "rolling summary decays" result. The mitigation isn't to fight it — it's to not rely on the summary for anything sharp. Anything that must stay exact (the city, a constraint) lives in the fact table, which doesn't erode. The summary only carries what's tolerable-if-fuzzy. Split the job and each layer plays to its strength.
Try it
if __name__ == "__main__":
con = connect()
turns = [
"User: I'm Maya, a backend engineer.",
"Assistant: Nice to meet you, Maya.",
"User: I'm leading the billing rewrite and I prefer terse answers.",
]
print(summarize_session(con, session=1, turns=turns))$ python memory.py
Maya is a backend engineer leading the billing rewrite. She prefers terse,
to-the-point answers.Now we have both layers. Step 6 wires them around an actual agent turn.
Reference: Ollama chat API · Agent Long-Term Memory — the session-15 lab · Agentic Context Engineering (drip)