An interactive course on agent memory.
Twenty-eight pages. Six tracks. Every page has an interactive D3 demo you can play with. The same examples thread through every section, so you build one mental model — not a hundred fragments.
Foundations
What agent memory is, why it matters, and why most systems fail.
- 1.19 minWhat is Agent Memory?Agent memory is structured, durable state that lets an LLM agent remember across turns, sessions, and tasks. A primer on episodic, semantic, and procedural memory for engineers.
- 1.218 minWhy Your Agent Forgets (and How to Fix It)Agents forget because most memory systems store everything indiscriminately, then drown the relevant signal. Here's the architecture of memory loss — and what to do about it.
- 1.319 minThe Cost of Junk MemoriesIndiscriminate memory storage burns tokens, slows retrieval, and degrades answer quality. The math of why filtering before storage beats cleanup after.
- 1.49 minTyped Memory: Beyond Flat TextFlat text memories collapse facts, preferences, events, entities, and relations into one bucket — and lose temporal and relational signal. The case for typed schemas.
- 1.510 minMemory vs RAG vs Long ContextThree approaches to giving LLMs more knowledge. They are not interchangeable — each wins on a different axis. A decision framework.
Write Pipeline
How memories get filtered, extracted, classified, and stored.
- 2.122 minThe 7-Stage Write PipelinePre-filter, extract, classify, resolve, dedupe, conflict-check, persist. The choreography that turns conversational turns into high-confidence memories.
- 2.218 minPre-Filter: Rejection Before StorageMost conversational turns are junk. The cheapest filter — regex and length heuristics — rejects them before any LLM extraction runs.
- 2.318 minLLM Extraction as Filtering, Not RecallExtraction prompts that ask 'what is memorable here' beat prompts that ask 'extract every fact'. The framing flips the precision/recall trade-off.
- 2.421 minEntity Resolution: From Pronouns to IdentityHe, she, my boss, that thing we discussed — agents must turn references into stable identities. A four-stage cascade from grammar to LLM judge.
- 2.520 minThree Tiers of DeduplicationHash-equality, cosine-similarity, LLM-judge. Three thresholds that catch 99% of duplicates while paying LLM costs only when needed.
Read Pipeline
How memories get retrieved, fused, ranked, and assembled into context.
- 3.111 minFive Retrievers Are Better Than OneSemantic, lexical, entity-graph, temporal, type-filter. No single retriever wins all queries — fusion does.
- 3.218 minReciprocal Rank Fusion, ExplainedRRF combines ranked lists without needing to normalize scores. The formula, the intuition, and why it beats weighted score sums.
- 3.310 minThe Query Optimizer for MemoryNot every query needs every retriever. Routing by entity density, temporal precision, and lexical rarity halves p99 latency.
- 3.410 minEntity Graphs for Multi-Hop ReasoningVectors find similar memories. Graphs find connected ones. Why agent memory needs both — and how to score multi-hop paths.
- 3.510 minContext Aggregation: Token BudgetingSix categories, one budget. Allocating tokens between facts, preferences, events, entities, summaries, and recent turns — and avoiding lost-in-the-middle.
Math Foundations
Confidence, decay, fusion — the formulas behind quality memory.
- 4.114 minThe Confidence FormulaSource strength, repetition, extractor quality, type prior. A weighted blend that turns 'how much do we trust this memory' into a single number.
- 4.214 minWhy Logarithmic Repetition BoostThe 100th observation should not outweigh the 10th. Why log scaling beats linear for repetition signals — with the curve.
- 4.314 minFreshness Decay CurvesFacts age slowly. Preferences age faster. Events expire. Type-specific exponential decay with retrieval-driven half-life resets.
- 4.414 minBM25 for Memory, in Plain EnglishTerm frequency saturation, inverse document frequency, length normalization. Why BM25 still belongs in your retrieval mix in 2026.
- 4.510 minHNSW Tuning: m, ef, and Memory CostThree parameters control the speed/recall/memory triangle for vector indexes. The math you need to size an HNSW index correctly.
Production
Hallucination defense, drift detection, scale, maintenance.
- 5.110 minThree Layers of Hallucination DefenseWrite-time grounding, store-time consistency, read-time faithfulness. Defense in depth for memories that must not lie.
- 5.211 minDetecting Memory DriftConcept drift, data drift, schema drift, vocabulary shift. The four kinds of drift in long-running memory — and how to detect them.
- 5.39 minConcept Drift: Dual-Signal DetectionCentroid distance alone is noisy. Relation overlap alone is sparse. Together, they reliably separate entity evolution from semantic shift.
- 5.410 minScaling to 1B Memories: Index TiersSQLite-vec at 100K. pgvector HNSW at 100M. Sharded indexes at 1B+. The four-tier ladder and where each rung breaks.
- 5.58 minThe Background Worker: 7 Maintenance JobsDecay, consolidation, drift scan, consistency check, embedding refresh, garbage collect, snapshot. The maintenance loop that keeps memory healthy.
Comparisons
Recall vs alternative memory and retrieval systems.
- 6.114 minMem0 vs Letta vs Zep vs RecallAn honest comparison of the four leading agent memory systems — what each is good at, where each falls short, and which one fits which use case.
- 6.210 minVector DB ≠ Memory: Why Pinecone Isn't EnoughA vector database is a substrate. Agent memory is a system on top of one. Why you need types, supersession, and temporal reasoning — not just embeddings.
- 6.312 minLangChain Memory vs LangGraph State vs RecallConversational buffers, graph state, durable typed memory. Three different abstractions for three different jobs.