Agent Memory Glossary
51 terms used in agent memory engineering. Concepts, algorithms, comparisons, and memory types — the language of building agents that remember.
Core concepts
Foundational definitions
- Agent memoryWhat is agent memory? →
- Durable, structured state — facts, preferences, events, entities, and relations — that an LLM agent persists outside its context window and retrieves on demand. The substrate that makes an agent feel continuous across sessions.
- Episodic memory
- Memory of specific timestamped events — 'user joined the platform team on 2026-05-09'. Append-only, decays fastest, supports temporal-anchored queries.
- Semantic memory
- Memory of stable predicates — 'user works at Volkswagen', 'Volkswagen is an automotive company'. Changes rarely, retrieved often, decays slowly.
- Procedural memory
- Memory of how to do things — recurring patterns, code-style preferences, workflow conventions. Distinct from facts (what is) and events (what happened).
- Fact (memory type)
- A stable predicate stored as a memory: subject + relation + object. 'User works at Volkswagen.' Type prior in confidence formula: 0.7. Default decay half-life: 180 days.
- Preference (memory type)
- A mutable user choice stored as a memory: 'user prefers dark mode.' Supersedes when contradicted. Default decay half-life: 90 days. Repetition-boost-friendly.
- Event (memory type)
- A timestamped occurrence stored as a memory: 'user joined platform team on 2026-05-09.' Append-only, never supersedes. Default decay half-life: 30 days.
- Entity (memory type)
- A stable identity referenced across memories — a person, organization, place, project. Sticky; once resolved, lasts. Default decay half-life: 365 days.
- Relation (memory type)
- A typed edge between entities: 'user reports-to Sarah'. Carries confidence and a temporal window. Foundation for multi-hop reasoning.
- Memory lifecycleLifecycle states →
- Memories transition through four states: active → superseded → expired → forgotten. Each transition is rule-driven and audit-logged.
- Cold-start problem
- On a fresh user, the memory store is empty so retrieval returns nothing. Mitigations: cohort defaults, eager-extract from onboarding, treat absence as a first-class result.
Write pipeline
Filtering, extraction, classification
- Write pipeline7-stage write pipeline →
- The sequence of stages a candidate memory passes through before persistence. Cheap-to-expensive: pre-filter → extract → classify → resolve → dedupe → conflict-check → persist.
- Pre-filterPre-filter explained →
- First stage of the write pipeline. Pattern + length rules drop greetings, acknowledgements, meta-talk, and code-only blocks. Free; rejects 60–70% of incoming turns.
- ExtractionExtraction as filtering →
- LLM-driven generation of candidate memories from a turn. Framing matters: 'what is memorable here' produces dramatically better stores than 'extract every fact.'
- Entity resolutionEntity resolution →
- Turning conversational references ('she', 'my boss', 'VW') into stable entity IDs. Four-stage cascade: pronoun rules → grammar parse → fuzzy match → LLM judge.
- DeduplicationThree tiers of dedup →
- Three-tier dedup: hash equality → cosine similarity (0.85 / 0.92 thresholds) → LLM judge. Repetition increments rather than discarding.
- Supersession
- When a new memory contradicts an existing one, the older is marked superseded (kept for audit) rather than overwritten. Distinct from deduplication.
- Conflict detection
- Write-pipeline stage that detects whether a candidate contradicts an existing memory. Triggers supersession on contradiction; does nothing on agreement.
Retrieval
Search, fusion, ranking
- Read pipelineFive retrievers →
- The sequence from query to assembled context: parse → 5-retriever fan-out → RRF fusion → rerank → token-budgeted aggregation.
- Hybrid search
- Combining multiple retrieval methods (semantic + lexical + graph + temporal + type-filter) and fusing their results. No single retriever wins all queries.
- Reciprocal Rank Fusion (RRF)RRF explained →
- A score-free fusion method that combines ranked lists by summing 1/(k+rank) per item across retrievers. k=60 is standard. Avoids score-normalization bugs.
- BM25BM25 in plain English →
- Okapi BM25 — a 50-year-old lexical retrieval algorithm composing inverse document frequency, term-frequency saturation, and length normalization. Still wins on rare terms.
- Semantic search
- Retrieval by embedding similarity (typically cosine). Catches paraphrase; misses out-of-vocabulary terms and rare identifiers.
- Embedding
- A dense vector representation of text. Trained models map similar meanings to nearby points. Typical dims: 384–3072. Stored in a vector index for ANN lookup.
- Entity graphEntity graphs →
- Typed edges connecting entities: works-at, reports-to, lives-in. Enables multi-hop reasoning queries that no single memory's text contains.
- Query optimizerQuery optimizer →
- Pre-retrieval planner. Extracts query features (entity density, temporal precision, lexical rarity) and selects a retriever plan. Halves p99 on simple queries.
- Reranking
- A second-stage scorer (typically a cross-encoder) that re-orders the top-K from fusion. Catches off-topic high-similarity hits that ranking missed.
- Context aggregationToken budgeting →
- Assembling retrieved memories into a token-budgeted, structured prompt. Six categories share the budget: facts, preferences, events, entities, summary, recent turns.
- Lost-in-the-middle
- Stanford's finding (Liu et al., 2024): LLM answer quality follows a U-curve over context position. Place high-priority content at start and end; not middle.
Math & algorithms
Formulas and techniques
- Confidence (memory)Confidence formula →
- Per-memory trust score in [0,1]. Weighted blend: 0.45·source + 0.20·repetition + 0.25·extractor + 0.10·type-prior. Drives ranking, conflict resolution, decay floors.
- Repetition boostWhy log scaling →
- Logarithmic function of independent observation count: r(n) = 1 − 1/(1 + ln(1 + n)). Asymptotic to 1; the 100th observation does not outweigh the 10th.
- Freshness decayDecay curves →
- Exponential decay of memory recency: freshness(t) = 2^(-t/τ). Type-specific half-life τ. Access boost (logarithmic in retrieval count) counteracts decay for proven-useful memories.
- Access boost
- Multiplicative factor on freshness: 1 + ln(1 + access_count). Memories that prove their value at retrieval stay retrievable; unused ones decay.
- HNSWHNSW tuning →
- Hierarchical Navigable Small Worlds — the dominant ANN index for vectors. Three knobs: m (graph degree), ef_construction (build effort), ef_search (query effort).
- Cosine similarity
- Similarity metric between two vectors: cos(θ) = (a·b) / (‖a‖‖b‖). Range [-1, 1]; standard for embedding comparison. Threshold-based gating common in dedup.
- Inverse Document Frequency (IDF)
- BM25 component: ln((N − n + 0.5) / (n + 0.5) + 1). Rare terms get higher weight; common terms get lower. The discrimination signal in lexical retrieval.
Production
Operations, drift, scale
- Concept driftDual-signal drift →
- When the meaning of an entity shifts over time (Twitter → X). Detection: dual-signal — centroid distance > 0.4 AND relation Jaccard < 0.5. Either alone is noisy.
- Data drift
- Distributional shift in stored memories — users start writing differently, or schema changes. Detection: MMD (Maximum Mean Discrepancy) over recent vs historical samples.
- MMD (Maximum Mean Discrepancy)
- Kernel-based two-sample test for distributional drift. MMD²(P,Q) = E[k(x,x')] − 2E[k(x,y)] + E[k(y,y')]. Used with RBF kernel + permutation test for significance.
- Hallucination defenseThree-layer defense →
- Three-layer architecture: write-time grounding (verify against source span), store-time consistency (cross-memory contradiction scan), read-time faithfulness (rerank).
- Background worker7 maintenance jobs →
- Async maintenance loop running seven jobs on staggered cadences: decay (hourly), consolidation, drift scan, snapshot (daily), consistency, GC (weekly), embedding refresh (monthly).
- Junk rateCost of junk memories →
- Fraction of stored memories unhelpful at retrieval time. Production audits of rolling-extraction systems have measured 90%+. Pre-store filtering is the cheapest fix.
- Index tierScaling tiers →
- The four-rung scaling ladder for memory storage: SQLite-vec embedded (≤100K) → pgvector HNSW (≤10M) → sharded pgvector (≤100M) → specialized vector DB (1B+).
Comparisons
Adjacent technologies
- RAG (Retrieval-Augmented Generation)
- Pattern of retrieving from a static document corpus before generation. Distinct from agent memory: RAG is read-only; memory writes new state from interactions.
- Long contextMemory vs RAG vs LC →
- Loading large amounts of material into the LLM's context window per request. Defers retrieval rather than solving it; hits Lost-in-the-Middle on large prompts.
- Vector databaseVector DB ≠ memory →
- Storage substrate for embeddings with ANN indexing. A useful primitive but not a memory system on its own — types, supersession, decay, drift detection all live above.
- LangChain Memory
- Conversation-buffer abstractions (BufferMemory, WindowMemory, SummaryMemory). Operate within a single session; don't persist across sessions without external storage.
- LangGraph state
- Per-graph-execution typed state shared across nodes. Good for workflow state ('what task am I doing'). Different lifetime than agent memory ('what does this user prefer').
- Mem0
- Open-source agent memory framework (Python-first). Easiest to start with; lightest write-time filtering. As of 2026, the most widely-adopted memory framework by integrations.
- Letta
- Open-source agent memory framework featuring 'memory blocks' — typed editable regions the agent can manipulate explicitly. Programming-abstraction-first design.
- Zep
- Agent memory framework (Go) with first-class temporal indexing. Strong fit for time-anchored retrieval workloads ('what changed since last week').
Want the depth behind any of these?
Each term links to the deeper page in our Learn track. Twenty-eight pages with interactive demos.
Open the Learn hub →