Reference

Agent Memory Glossary

51 terms used in agent memory engineering. Concepts, algorithms, comparisons, and memory types — the language of building agents that remember.

01

Core concepts

Foundational definitions

Agent memoryWhat is agent memory?
Durable, structured state — facts, preferences, events, entities, and relations — that an LLM agent persists outside its context window and retrieves on demand. The substrate that makes an agent feel continuous across sessions.
Episodic memory
Memory of specific timestamped events — 'user joined the platform team on 2026-05-09'. Append-only, decays fastest, supports temporal-anchored queries.
Semantic memory
Memory of stable predicates — 'user works at Volkswagen', 'Volkswagen is an automotive company'. Changes rarely, retrieved often, decays slowly.
Procedural memory
Memory of how to do things — recurring patterns, code-style preferences, workflow conventions. Distinct from facts (what is) and events (what happened).
Fact (memory type)
A stable predicate stored as a memory: subject + relation + object. 'User works at Volkswagen.' Type prior in confidence formula: 0.7. Default decay half-life: 180 days.
Preference (memory type)
A mutable user choice stored as a memory: 'user prefers dark mode.' Supersedes when contradicted. Default decay half-life: 90 days. Repetition-boost-friendly.
Event (memory type)
A timestamped occurrence stored as a memory: 'user joined platform team on 2026-05-09.' Append-only, never supersedes. Default decay half-life: 30 days.
Entity (memory type)
A stable identity referenced across memories — a person, organization, place, project. Sticky; once resolved, lasts. Default decay half-life: 365 days.
Relation (memory type)
A typed edge between entities: 'user reports-to Sarah'. Carries confidence and a temporal window. Foundation for multi-hop reasoning.
Memory lifecycleLifecycle states
Memories transition through four states: active → superseded → expired → forgotten. Each transition is rule-driven and audit-logged.
Cold-start problem
On a fresh user, the memory store is empty so retrieval returns nothing. Mitigations: cohort defaults, eager-extract from onboarding, treat absence as a first-class result.
02

Write pipeline

Filtering, extraction, classification

Write pipeline7-stage write pipeline
The sequence of stages a candidate memory passes through before persistence. Cheap-to-expensive: pre-filter → extract → classify → resolve → dedupe → conflict-check → persist.
Pre-filterPre-filter explained
First stage of the write pipeline. Pattern + length rules drop greetings, acknowledgements, meta-talk, and code-only blocks. Free; rejects 60–70% of incoming turns.
ExtractionExtraction as filtering
LLM-driven generation of candidate memories from a turn. Framing matters: 'what is memorable here' produces dramatically better stores than 'extract every fact.'
Entity resolutionEntity resolution
Turning conversational references ('she', 'my boss', 'VW') into stable entity IDs. Four-stage cascade: pronoun rules → grammar parse → fuzzy match → LLM judge.
DeduplicationThree tiers of dedup
Three-tier dedup: hash equality → cosine similarity (0.85 / 0.92 thresholds) → LLM judge. Repetition increments rather than discarding.
Supersession
When a new memory contradicts an existing one, the older is marked superseded (kept for audit) rather than overwritten. Distinct from deduplication.
Conflict detection
Write-pipeline stage that detects whether a candidate contradicts an existing memory. Triggers supersession on contradiction; does nothing on agreement.
03

Retrieval

Search, fusion, ranking

Read pipelineFive retrievers
The sequence from query to assembled context: parse → 5-retriever fan-out → RRF fusion → rerank → token-budgeted aggregation.
Reciprocal Rank Fusion (RRF)RRF explained
A score-free fusion method that combines ranked lists by summing 1/(k+rank) per item across retrievers. k=60 is standard. Avoids score-normalization bugs.
BM25BM25 in plain English
Okapi BM25 — a 50-year-old lexical retrieval algorithm composing inverse document frequency, term-frequency saturation, and length normalization. Still wins on rare terms.
Embedding
A dense vector representation of text. Trained models map similar meanings to nearby points. Typical dims: 384–3072. Stored in a vector index for ANN lookup.
Entity graphEntity graphs
Typed edges connecting entities: works-at, reports-to, lives-in. Enables multi-hop reasoning queries that no single memory's text contains.
Query optimizerQuery optimizer
Pre-retrieval planner. Extracts query features (entity density, temporal precision, lexical rarity) and selects a retriever plan. Halves p99 on simple queries.
Reranking
A second-stage scorer (typically a cross-encoder) that re-orders the top-K from fusion. Catches off-topic high-similarity hits that ranking missed.
Context aggregationToken budgeting
Assembling retrieved memories into a token-budgeted, structured prompt. Six categories share the budget: facts, preferences, events, entities, summary, recent turns.
Lost-in-the-middle
Stanford's finding (Liu et al., 2024): LLM answer quality follows a U-curve over context position. Place high-priority content at start and end; not middle.
04

Math & algorithms

Formulas and techniques

Confidence (memory)Confidence formula
Per-memory trust score in [0,1]. Weighted blend: 0.45·source + 0.20·repetition + 0.25·extractor + 0.10·type-prior. Drives ranking, conflict resolution, decay floors.
Repetition boostWhy log scaling
Logarithmic function of independent observation count: r(n) = 1 − 1/(1 + ln(1 + n)). Asymptotic to 1; the 100th observation does not outweigh the 10th.
Freshness decayDecay curves
Exponential decay of memory recency: freshness(t) = 2^(-t/τ). Type-specific half-life τ. Access boost (logarithmic in retrieval count) counteracts decay for proven-useful memories.
Access boost
Multiplicative factor on freshness: 1 + ln(1 + access_count). Memories that prove their value at retrieval stay retrievable; unused ones decay.
HNSWHNSW tuning
Hierarchical Navigable Small Worlds — the dominant ANN index for vectors. Three knobs: m (graph degree), ef_construction (build effort), ef_search (query effort).
Cosine similarity
Similarity metric between two vectors: cos(θ) = (a·b) / (‖a‖‖b‖). Range [-1, 1]; standard for embedding comparison. Threshold-based gating common in dedup.
Inverse Document Frequency (IDF)
BM25 component: ln((N − n + 0.5) / (n + 0.5) + 1). Rare terms get higher weight; common terms get lower. The discrimination signal in lexical retrieval.
05

Production

Operations, drift, scale

Concept driftDual-signal drift
When the meaning of an entity shifts over time (Twitter → X). Detection: dual-signal — centroid distance > 0.4 AND relation Jaccard < 0.5. Either alone is noisy.
Data drift
Distributional shift in stored memories — users start writing differently, or schema changes. Detection: MMD (Maximum Mean Discrepancy) over recent vs historical samples.
MMD (Maximum Mean Discrepancy)
Kernel-based two-sample test for distributional drift. MMD²(P,Q) = E[k(x,x')] − 2E[k(x,y)] + E[k(y,y')]. Used with RBF kernel + permutation test for significance.
Hallucination defenseThree-layer defense
Three-layer architecture: write-time grounding (verify against source span), store-time consistency (cross-memory contradiction scan), read-time faithfulness (rerank).
Background worker7 maintenance jobs
Async maintenance loop running seven jobs on staggered cadences: decay (hourly), consolidation, drift scan, snapshot (daily), consistency, GC (weekly), embedding refresh (monthly).
Junk rateCost of junk memories
Fraction of stored memories unhelpful at retrieval time. Production audits of rolling-extraction systems have measured 90%+. Pre-store filtering is the cheapest fix.
Index tierScaling tiers
The four-rung scaling ladder for memory storage: SQLite-vec embedded (≤100K) → pgvector HNSW (≤10M) → sharded pgvector (≤100M) → specialized vector DB (1B+).
06

Comparisons

Adjacent technologies

RAG (Retrieval-Augmented Generation)
Pattern of retrieving from a static document corpus before generation. Distinct from agent memory: RAG is read-only; memory writes new state from interactions.
Long contextMemory vs RAG vs LC
Loading large amounts of material into the LLM's context window per request. Defers retrieval rather than solving it; hits Lost-in-the-Middle on large prompts.
Vector databaseVector DB ≠ memory
Storage substrate for embeddings with ANN indexing. A useful primitive but not a memory system on its own — types, supersession, decay, drift detection all live above.
LangChain Memory
Conversation-buffer abstractions (BufferMemory, WindowMemory, SummaryMemory). Operate within a single session; don't persist across sessions without external storage.
LangGraph state
Per-graph-execution typed state shared across nodes. Good for workflow state ('what task am I doing'). Different lifetime than agent memory ('what does this user prefer').
Mem0
Open-source agent memory framework (Python-first). Easiest to start with; lightest write-time filtering. As of 2026, the most widely-adopted memory framework by integrations.
Letta
Open-source agent memory framework featuring 'memory blocks' — typed editable regions the agent can manipulate explicitly. Programming-abstraction-first design.
Zep
Agent memory framework (Go) with first-class temporal indexing. Strong fit for time-anchored retrieval workloads ('what changed since last week').

Want the depth behind any of these?

Each term links to the deeper page in our Learn track. Twenty-eight pages with interactive demos.

Open the Learn hub →

Updates from the lab.

Engineering notes, research drops, occasional product updates. Roughly monthly.