Reference

Agent Memory Glossary

51 terms used in agent memory engineering. Concepts, algorithms, comparisons, and memory types — the language of building agents that remember.

Core concepts

Foundational definitions

Agent memoryWhat is agent memory? →: Durable, structured state — facts, preferences, events, entities, and relations — that an LLM agent persists outside its context window and retrieves on demand. The substrate that makes an agent feel continuous across sessions.
Episodic memory: Memory of specific timestamped events — 'user joined the platform team on 2026-05-09'. Append-only, decays fastest, supports temporal-anchored queries.
Semantic memory: Memory of stable predicates — 'user works at Volkswagen', 'Volkswagen is an automotive company'. Changes rarely, retrieved often, decays slowly.
Procedural memory: Memory of how to do things — recurring patterns, code-style preferences, workflow conventions. Distinct from facts (what is) and events (what happened).
Fact (memory type): A stable predicate stored as a memory: subject + relation + object. 'User works at Volkswagen.' Type prior in confidence formula: 0.7. Default decay half-life: 180 days.
Preference (memory type): A mutable user choice stored as a memory: 'user prefers dark mode.' Supersedes when contradicted. Default decay half-life: 90 days. Repetition-boost-friendly.
Event (memory type): A timestamped occurrence stored as a memory: 'user joined platform team on 2026-05-09.' Append-only, never supersedes. Default decay half-life: 30 days.
Entity (memory type): A stable identity referenced across memories — a person, organization, place, project. Sticky; once resolved, lasts. Default decay half-life: 365 days.
Relation (memory type): A typed edge between entities: 'user reports-to Sarah'. Carries confidence and a temporal window. Foundation for multi-hop reasoning.
Memory lifecycleLifecycle states →: Memories transition through four states: active → superseded → expired → forgotten. Each transition is rule-driven and audit-logged.
Cold-start problem: On a fresh user, the memory store is empty so retrieval returns nothing. Mitigations: cohort defaults, eager-extract from onboarding, treat absence as a first-class result.

Write pipeline

Filtering, extraction, classification

Write pipeline7-stage write pipeline →: The sequence of stages a candidate memory passes through before persistence. Cheap-to-expensive: pre-filter → extract → classify → resolve → dedupe → conflict-check → persist.
Pre-filterPre-filter explained →: First stage of the write pipeline. Pattern + length rules drop greetings, acknowledgements, meta-talk, and code-only blocks. Free; rejects 60–70% of incoming turns.
ExtractionExtraction as filtering →: LLM-driven generation of candidate memories from a turn. Framing matters: 'what is memorable here' produces dramatically better stores than 'extract every fact.'
Entity resolutionEntity resolution →: Turning conversational references ('she', 'my boss', 'VW') into stable entity IDs. Four-stage cascade: pronoun rules → grammar parse → fuzzy match → LLM judge.
DeduplicationThree tiers of dedup →: Three-tier dedup: hash equality → cosine similarity (0.85 / 0.92 thresholds) → LLM judge. Repetition increments rather than discarding.
Supersession: When a new memory contradicts an existing one, the older is marked superseded (kept for audit) rather than overwritten. Distinct from deduplication.
Conflict detection: Write-pipeline stage that detects whether a candidate contradicts an existing memory. Triggers supersession on contradiction; does nothing on agreement.

Retrieval

Search, fusion, ranking

Read pipelineFive retrievers →: The sequence from query to assembled context: parse → 5-retriever fan-out → RRF fusion → rerank → token-budgeted aggregation.
Hybrid search: Combining multiple retrieval methods (semantic + lexical + graph + temporal + type-filter) and fusing their results. No single retriever wins all queries.
Reciprocal Rank Fusion (RRF)RRF explained →: A score-free fusion method that combines ranked lists by summing 1/(k+rank) per item across retrievers. k=60 is standard. Avoids score-normalization bugs.
BM25BM25 in plain English →: Okapi BM25 — a 50-year-old lexical retrieval algorithm composing inverse document frequency, term-frequency saturation, and length normalization. Still wins on rare terms.
Semantic search: Retrieval by embedding similarity (typically cosine). Catches paraphrase; misses out-of-vocabulary terms and rare identifiers.
Embedding: A dense vector representation of text. Trained models map similar meanings to nearby points. Typical dims: 384–3072. Stored in a vector index for ANN lookup.
Entity graphEntity graphs →: Typed edges connecting entities: works-at, reports-to, lives-in. Enables multi-hop reasoning queries that no single memory's text contains.
Query optimizerQuery optimizer →: Pre-retrieval planner. Extracts query features (entity density, temporal precision, lexical rarity) and selects a retriever plan. Halves p99 on simple queries.
Reranking: A second-stage scorer (typically a cross-encoder) that re-orders the top-K from fusion. Catches off-topic high-similarity hits that ranking missed.
Context aggregationToken budgeting →: Assembling retrieved memories into a token-budgeted, structured prompt. Six categories share the budget: facts, preferences, events, entities, summary, recent turns.
Lost-in-the-middle: Stanford's finding (Liu et al., 2024): LLM answer quality follows a U-curve over context position. Place high-priority content at start and end; not middle.

Math & algorithms

Formulas and techniques

Confidence (memory)Confidence formula →: Per-memory trust score in [0,1]. Weighted blend: 0.45·source + 0.20·repetition + 0.25·extractor + 0.10·type-prior. Drives ranking, conflict resolution, decay floors.
Repetition boostWhy log scaling →: Logarithmic function of independent observation count: r(n) = 1 − 1/(1 + ln(1 + n)). Asymptotic to 1; the 100th observation does not outweigh the 10th.
Freshness decayDecay curves →: Exponential decay of memory recency: freshness(t) = 2^(-t/τ). Type-specific half-life τ. Access boost (logarithmic in retrieval count) counteracts decay for proven-useful memories.
Access boost: Multiplicative factor on freshness: 1 + ln(1 + access_count). Memories that prove their value at retrieval stay retrievable; unused ones decay.
HNSWHNSW tuning →: Hierarchical Navigable Small Worlds — the dominant ANN index for vectors. Three knobs: m (graph degree), ef_construction (build effort), ef_search (query effort).
Cosine similarity: Similarity metric between two vectors: cos(θ) = (a·b) / (‖a‖‖b‖). Range [-1, 1]; standard for embedding comparison. Threshold-based gating common in dedup.
Inverse Document Frequency (IDF): BM25 component: ln((N − n + 0.5) / (n + 0.5) + 1). Rare terms get higher weight; common terms get lower. The discrimination signal in lexical retrieval.

Production

Operations, drift, scale

Concept driftDual-signal drift →: When the meaning of an entity shifts over time (Twitter → X). Detection: dual-signal — centroid distance > 0.4 AND relation Jaccard < 0.5. Either alone is noisy.
Data drift: Distributional shift in stored memories — users start writing differently, or schema changes. Detection: MMD (Maximum Mean Discrepancy) over recent vs historical samples.
MMD (Maximum Mean Discrepancy): Kernel-based two-sample test for distributional drift. MMD²(P,Q) = E[k(x,x')] − 2E[k(x,y)] + E[k(y,y')]. Used with RBF kernel + permutation test for significance.
Hallucination defenseThree-layer defense →: Three-layer architecture: write-time grounding (verify against source span), store-time consistency (cross-memory contradiction scan), read-time faithfulness (rerank).
Background worker7 maintenance jobs →: Async maintenance loop running seven jobs on staggered cadences: decay (hourly), consolidation, drift scan, snapshot (daily), consistency, GC (weekly), embedding refresh (monthly).
Junk rateCost of junk memories →: Fraction of stored memories unhelpful at retrieval time. Production audits of rolling-extraction systems have measured 90%+. Pre-store filtering is the cheapest fix.
Index tierScaling tiers →: The four-rung scaling ladder for memory storage: SQLite-vec embedded (≤100K) → pgvector HNSW (≤10M) → sharded pgvector (≤100M) → specialized vector DB (1B+).

Comparisons

Adjacent technologies

RAG (Retrieval-Augmented Generation): Pattern of retrieving from a static document corpus before generation. Distinct from agent memory: RAG is read-only; memory writes new state from interactions.
Long contextMemory vs RAG vs LC →: Loading large amounts of material into the LLM's context window per request. Defers retrieval rather than solving it; hits Lost-in-the-Middle on large prompts.
Vector databaseVector DB ≠ memory →: Storage substrate for embeddings with ANN indexing. A useful primitive but not a memory system on its own — types, supersession, decay, drift detection all live above.
LangChain Memory: Conversation-buffer abstractions (BufferMemory, WindowMemory, SummaryMemory). Operate within a single session; don't persist across sessions without external storage.
LangGraph state: Per-graph-execution typed state shared across nodes. Good for workflow state ('what task am I doing'). Different lifetime than agent memory ('what does this user prefer').
Mem0: Open-source agent memory framework (Python-first). Easiest to start with; lightest write-time filtering. As of 2026, the most widely-adopted memory framework by integrations.
Letta: Open-source agent memory framework featuring 'memory blocks' — typed editable regions the agent can manipulate explicitly. Programming-abstraction-first design.
Zep: Agent memory framework (Go) with first-class temporal indexing. Strong fit for time-anchored retrieval workloads ('what changed since last week').

Want the depth behind any of these?

Each term links to the deeper page in our Learn track. Twenty-eight pages with interactive demos.

Open the Learn hub →

Want the depth behind any of these?

Updates from the lab.