Arc Labs · Learn

An interactive course on agent memory.

Twenty-eight pages. Six tracks. Every page has an interactive D3 demo you can play with. The same examples thread through every section, so you build one mental model — not a hundred fragments.

Track 01

Foundations

What agent memory is, why it matters, and why most systems fail.

  1. 1.19 minWhat is Agent Memory?Agent memory is structured, durable state that lets an LLM agent remember across turns, sessions, and tasks. A primer on episodic, semantic, and procedural memory for engineers.
  2. 1.218 minWhy Your Agent Forgets (and How to Fix It)Agents forget because most memory systems store everything indiscriminately, then drown the relevant signal. Here's the architecture of memory loss — and what to do about it.
  3. 1.319 minThe Cost of Junk MemoriesIndiscriminate memory storage burns tokens, slows retrieval, and degrades answer quality. The math of why filtering before storage beats cleanup after.
  4. 1.49 minTyped Memory: Beyond Flat TextFlat text memories collapse facts, preferences, events, entities, and relations into one bucket — and lose temporal and relational signal. The case for typed schemas.
  5. 1.510 minMemory vs RAG vs Long ContextThree approaches to giving LLMs more knowledge. They are not interchangeable — each wins on a different axis. A decision framework.
Track 02

Write Pipeline

How memories get filtered, extracted, classified, and stored.

  1. 2.122 minThe 7-Stage Write PipelinePre-filter, extract, classify, resolve, dedupe, conflict-check, persist. The choreography that turns conversational turns into high-confidence memories.
  2. 2.218 minPre-Filter: Rejection Before StorageMost conversational turns are junk. The cheapest filter — regex and length heuristics — rejects them before any LLM extraction runs.
  3. 2.318 minLLM Extraction as Filtering, Not RecallExtraction prompts that ask 'what is memorable here' beat prompts that ask 'extract every fact'. The framing flips the precision/recall trade-off.
  4. 2.421 minEntity Resolution: From Pronouns to IdentityHe, she, my boss, that thing we discussed — agents must turn references into stable identities. A four-stage cascade from grammar to LLM judge.
  5. 2.520 minThree Tiers of DeduplicationHash-equality, cosine-similarity, LLM-judge. Three thresholds that catch 99% of duplicates while paying LLM costs only when needed.
Track 03

Read Pipeline

How memories get retrieved, fused, ranked, and assembled into context.

  1. 3.111 minFive Retrievers Are Better Than OneSemantic, lexical, entity-graph, temporal, type-filter. No single retriever wins all queries — fusion does.
  2. 3.218 minReciprocal Rank Fusion, ExplainedRRF combines ranked lists without needing to normalize scores. The formula, the intuition, and why it beats weighted score sums.
  3. 3.310 minThe Query Optimizer for MemoryNot every query needs every retriever. Routing by entity density, temporal precision, and lexical rarity halves p99 latency.
  4. 3.410 minEntity Graphs for Multi-Hop ReasoningVectors find similar memories. Graphs find connected ones. Why agent memory needs both — and how to score multi-hop paths.
  5. 3.510 minContext Aggregation: Token BudgetingSix categories, one budget. Allocating tokens between facts, preferences, events, entities, summaries, and recent turns — and avoiding lost-in-the-middle.
Track 04

Math Foundations

Confidence, decay, fusion — the formulas behind quality memory.

  1. 4.114 minThe Confidence FormulaSource strength, repetition, extractor quality, type prior. A weighted blend that turns 'how much do we trust this memory' into a single number.
  2. 4.214 minWhy Logarithmic Repetition BoostThe 100th observation should not outweigh the 10th. Why log scaling beats linear for repetition signals — with the curve.
  3. 4.314 minFreshness Decay CurvesFacts age slowly. Preferences age faster. Events expire. Type-specific exponential decay with retrieval-driven half-life resets.
  4. 4.414 minBM25 for Memory, in Plain EnglishTerm frequency saturation, inverse document frequency, length normalization. Why BM25 still belongs in your retrieval mix in 2026.
  5. 4.510 minHNSW Tuning: m, ef, and Memory CostThree parameters control the speed/recall/memory triangle for vector indexes. The math you need to size an HNSW index correctly.
Track 05

Production

Hallucination defense, drift detection, scale, maintenance.

  1. 5.110 minThree Layers of Hallucination DefenseWrite-time grounding, store-time consistency, read-time faithfulness. Defense in depth for memories that must not lie.
  2. 5.211 minDetecting Memory DriftConcept drift, data drift, schema drift, vocabulary shift. The four kinds of drift in long-running memory — and how to detect them.
  3. 5.39 minConcept Drift: Dual-Signal DetectionCentroid distance alone is noisy. Relation overlap alone is sparse. Together, they reliably separate entity evolution from semantic shift.
  4. 5.410 minScaling to 1B Memories: Index TiersSQLite-vec at 100K. pgvector HNSW at 100M. Sharded indexes at 1B+. The four-tier ladder and where each rung breaks.
  5. 5.58 minThe Background Worker: 7 Maintenance JobsDecay, consolidation, drift scan, consistency check, embedding refresh, garbage collect, snapshot. The maintenance loop that keeps memory healthy.
Track 06

Comparisons

Recall vs alternative memory and retrieval systems.

  1. 6.114 minMem0 vs Letta vs Zep vs RecallAn honest comparison of the four leading agent memory systems — what each is good at, where each falls short, and which one fits which use case.
  2. 6.210 minVector DB ≠ Memory: Why Pinecone Isn't EnoughA vector database is a substrate. Agent memory is a system on top of one. Why you need types, supersession, and temporal reasoning — not just embeddings.
  3. 6.312 minLangChain Memory vs LangGraph State vs RecallConversational buffers, graph state, durable typed memory. Three different abstractions for three different jobs.

Updates from the lab.

Engineering notes, research drops, occasional product updates. Roughly monthly.