Memory for Customer Support Agents
The problem
Support is a repeated-interaction domain. The same user comes back with a follow-up question; a colleague picks up a ticket; a customer escalates. Every time the agent starts cold, the customer pays the cost — they re-explain context the team should already have.
"I told the last agent I'm on the enterprise plan" — a sentence no support organization wants to hear. It doesn't only damage that conversation; it changes how the customer rates the entire product.
What agent memory gives you
Agent memory turns the support agent from stateless to continuous. Three categories of memory matter most:
- Account state — plan tier, integrations enabled, environment (production vs sandbox), currently-deployed version. Stored as facts; supersedes when configuration changes.
- Issue history — recent tickets, recurring problems, escalation patterns. Stored as events; queryable temporally ("has this customer hit this issue before?").
- Communication preferences — preferred contact channel, response style, technical level. Stored as preferences; supersedes on update.
With these in the memory store, the agent can begin every conversation already informed — same as a human support engineer who reviewed the account before picking up the ticket.
How the write pipeline behaves for support agents
Recall's write pipeline runs in seven stages: pre_filter → extract → resolve_refs → [enrichers] → dedupe → conflict → persist. For support agents, the configuration of each stage differs meaningfully from general-purpose use.
pre_filter should be configured to reject: pleasantries ("thank you", "you're welcome", "have a great day"), template boilerplate your agent sends verbatim every time (greetings, closing lines, policy disclaimers), and internal agent reasoning traces that should never surface in the memory store. Set relevance_threshold: 0.50 for support workloads — more aggressive than the default 0.40, because support conversations contain a high ratio of social framing to actual informational content.
extract converts surviving turns into typed candidates. The mapping for support agents:
- Account configuration changes →
fact(supersedes the previous value for that configuration key) - Issue reports with confirmed resolution →
eventwith entities:[account, issue_type, resolution_type] - Communication style signals — "please be brief", "send me a summary rather than steps", "I prefer email" →
preference - Recurring problems ("this is the third time I've had this issue with SSO") →
factabout the issue pattern, taggedissue-pattern, scoped to the account
The entity tagging on issue events matters for multi-hop queries later: when a new customer hits an authentication issue, the entity-graph retriever can surface "this issue type has been resolved via X for similar configurations" without leaking any specific customer's data.
conflict handles account state changes. When an account upgrades their plan, the new fact (plan: Enterprise) must supersede the old one (plan: Professional) — reliably, every time, with no missed supersession. If the agent is ever quoting limits from a stale plan tier because supersession failed, it will tell an enterprise customer they have a starter plan's rate limits. That call escalates immediately. For account state, configure the conflict stage to trigger supersession on any fact sharing the same subject entity and predicate, regardless of confidence score.
Retriever strategy for support
For a support agent loading context at session start, the most effective query is broad — pass the customer's first message directly:
const memories = await recall.search({
query: customerMessage,
scope: { customer_id: ctx.customer.id },
types: ["fact", "preference", "event"],
limit: 25,
});Sending 25 results and letting the reranker and token budget logic trim is more reliable than pre-filtering aggressively. The first message from a customer often contains the most signal about what account context is relevant.
Retriever weight recommendations for support:
- Entity-graph retriever: highest weight (
0.40). Support conversations are account-graph-heavy. An account may have multiple contacts (the primary admin, the technical lead, the billing contact), and those contacts may have distinct technical levels. A two-hop walk from the account entity surfaces the right set of related facts without requiring the query to name them explicitly. - Semantic retriever:
0.30. Useful for "find similar issues this customer has encountered before", where the query may not share vocabulary with the stored event. - BM25/lexical retriever:
0.20. Error codes, ticket IDs, feature names, and integration names are exact-match critical. SAML error codeAADSTS50011means something specific; the lexical retriever finds it where semantic search might miss or dilute it. - Temporal retriever:
0.10. "What happened in the last 90 days?" is the most common temporal window in support. Use it as a secondary constraint — prefer the last 90 days of events over older ones when ranking is otherwise equal. - Type-filter: applied first as a pre-filter, before any scoring. Always narrow to relevant types: when handling a billing question, filter to
fact(account configuration) first; when handling a bug report, also includeevent(issue history).
The default RRF weighting (k=60) applies: weight × sqrt(score) / (k + rank). The entity-graph receiving the highest weight here is the key difference from general-purpose retrieval — in support, who the customer is and what their account graph looks like is more important than how similar the current query text is to any specific memory.
Tenant isolation — the failure mode that ends companies
Every memory operation must include the customer scope. This is not a performance optimization or a best practice — it is a correctness requirement. Cross-customer memory leakage — surfacing one customer's ticket history or account configuration in another customer's conversation — is a reportable security incident under GDPR, SOC 2, and most enterprise contracts.
The rule is simple: never issue a recall query without customer_id in the scope. No exceptions for "internal" queries, "aggregate" lookups, or "diagnostic" calls.
// ALWAYS pass customer_id in scope. Never query without it.
const memories = await recall.search({
query: message,
scope: { customer_id: ctx.customer.id },
types: ["fact", "preference", "event"],
limit: 25,
});
// For cross-account "issue pattern" insights (anonymized):
const patterns = await recall.search({
query: "authentication error saml configuration",
scope: { org_id: "support_org" }, // aggregate scope, no customer_id
types: ["fact"],
filters: { tags: ["issue-pattern"] },
limit: 10,
});
// These pattern facts contain no customer-identifying information —
// they were written at extraction time with identifying details strippedThe separation between customer-scoped queries and org-scoped pattern queries is intentional and must be maintained at write time, not just at query time. When the extract stage creates an issue-pattern fact, it must be written to the org_id scope with all customer identifiers removed. If you write the pattern to the customer scope and then try to query it org-wide, isolation will block the cross-customer read — correctly.
Recall's isolation guarantee: queries scoped to { customer_id: "c_abc" } cannot return memories scoped to any other customer_id, regardless of embedding similarity. This is enforced at the database level via row-level scoping, not just via query filtering. A bug in the application query layer cannot produce cross-customer leakage because the storage layer enforces the namespace boundary independently.
CRM integration pattern
Agent memory complements the CRM; it does not replace it. The CRM is the system of record for account data — contract terms, billing history, account ownership, SLA tiers. Agent memory is the working surface for conversational context — what the customer said in their last three sessions, what resolution worked for their recurring integration issue, how they prefer to receive updates.
The pattern: at session start, read from the CRM and from Recall in parallel. Pass CRM data into the system prompt directly (not into the persistent memory store). Pass Recall memories into the memory injection block.
async function loadSupportContext(customerId: string, message: string) {
const [crmAccount, recallMemories] = await Promise.all([
crm.getAccount(customerId), // read-only CRM pull — authoritative for plan, limits, ownership
recall.search({
query: message,
scope: { customer_id: customerId },
types: ["fact", "preference", "event"],
limit: 25,
}),
]);
// CRM data: inject into system prompt as ground truth
// Recall memories: inject into memory block for conversational continuity
return buildPrompt({ crmAccount, recallMemories });
}Do not write CRM data into Recall as facts. CRM data changes via CRM workflows (plan upgrades through billing, ownership through account management). If you duplicate CRM data into Recall, you will eventually serve stale Recall facts that contradict the live CRM — and the agent will tell a customer wrong information about their own account.
Write through to the CRM for high-confidence state changes discovered in conversation — plan change requests, new integration confirmations, environment updates. Recall is where you discover these during the session; the CRM is where you durably record them. The flow: agent notices a state change → writes it to Recall for session continuity → triggers a CRM write via your existing workflow → on the next session, the CRM pull surfaces the authoritative updated value.
Freshness decay and supersession for support
Support memory needs more aggressive freshness management than most domains, because support context goes stale fast. An integration that was "enabled" last year may have been disabled and re-enabled three times since. A communication preference from a contact who left the company is actively misleading.
Recommended decay half-lives by type:
- Account configuration (plan, integrations, environment): half-life 365 days, but trigger explicit supersession on any confirmed state change rather than relying on natural decay. Configuration changes must be handled immediately, not eventually.
- Issue history (events): half-life 90 days. A ticket from six months ago is probably not relevant to today's question. Let the temporal retriever deprioritize it naturally; do not keep surfacing resolved historical issues as primary context.
- Communication preferences: half-life 90 days (standard preference decay). A customer who asked for brief responses a year ago may now want detailed technical breakdowns. Decay the old preference; let a new explicit signal update it.
- Recurring issue patterns (facts tagged
issue-pattern): half-life 180 days. These live longer because they're valuable for the next support engineer who picks up a related ticket, and because issue patterns tend to be stable over longer periods than individual account configurations.
The daily consolidation background job merges duplicate facts that accumulate across multiple conversations. For support, set the consolidation similarity threshold conservatively: similarity_threshold: 0.92. Two facts about integration configurations may look textually similar but refer to distinct integrations. At 0.85 similarity, you risk merging "Salesforce integration: enabled" and "Slack integration: enabled" into a single merged fact that loses the integration-type specificity. Conservative thresholds prevent this.
Measuring support memory quality
First-response resolution lift: compare resolution rates for conversations where the agent cited a memory versus conversations where it answered cold. This is the headline metric for support memory ROI. Target: greater than 15% lift in first-response resolution for memory-assisted conversations. If you see no lift, retrieval precision is low — the agent is retrieving memories but they're not the right ones.
Context re-explanation rate: track how often customers say "as I mentioned before", "I already told the last agent", or similar phrases. This is the most direct signal of memory recall failure — when it happens, the agent retrieved either the wrong memories or no memories. Target: below 3% of support conversations. Above 3% means the memory system is not surfacing the right context at session start, and customers are noticing.
Memory precision check: weekly, pull twenty random memories from the customer memory store across several different accounts. For each memory: verify it is accurate (not an extraction hallucination), relevant (something a support engineer would actually want to know), and fresh (not superseded by a more recent fact that the supersession logic missed). This is the manual quality bar. It takes about fifteen minutes per week and is the most reliable way to catch systematic extraction or supersession failures before they affect customers at scale.
Pre-filter rejection rate: instrument the pre_filter stage to log rejected candidates with their relevance score. Review a random sample weekly. If more than 15% of your rejections contain actual account configuration information or issue history — content that a support engineer would want to remember — lower your relevance_threshold by 0.05 and re-evaluate. The target is to catch all useful account signal and reject all pleasantries; the threshold is the primary tuning lever between those two failure modes.
One operational detail: the provenance endpoint returns extracted_by.model_version_hash for every memory. Use this field to detect when a model update to your extraction pipeline changed what gets extracted. A sudden jump in supersession rate or a drop in memory precision immediately following a deployment is usually the extraction model, not the codebase — check the hash distribution before assuming application-level causes.
Example flow
- 1Customer opens new chatAgent retrieves account facts, last 90 days of events, and preferences scoped to this customer.
- 2Customer asks a questionQuery optimizer routes through entity-graph (this customer's account graph) plus semantic + lexical retrievers.
- 3Top-K memories assembledReranker filters off-topic hits. Token budget allocated 40% facts, 30% recent events, 20% preferences, 10% other.
- 4Agent responds with contextResponse references prior issues by ticket ID; the customer doesn't re-explain.
- 5Conversation produces new memoriesNew event (this conversation), maybe an updated preference. Goes through the 7-stage write pipeline; pre-filter rejects pleasantries.
- 6Background job consolidatesDaily consolidation merges duplicate facts across this customer; supersedes outdated configuration.
Patterns that work
- + Per-customer namespaceStrict tenant isolation. Memory queries always include customer ID; cross-customer leakage is the failure mode that ends the company.
- + Issue patterns as relationsBuild a graph of issue types, resolutions, and customers. Two-hop queries surface 'similar customers had this issue' insights without leaking specifics.
- + Confidence-aware citingRender the memory ID with each fact in the prompt. The agent should cite when answering from memory; the user can ask 'where did you get that?'
- + Aggressive supersession on configurationWhen account state changes (plan upgrade, integration enabled), explicitly mark old memories superseded — don't wait for natural conflict detection.
Pitfalls to avoid
- − Cross-customer memory bleedStoring memories without strict per-customer scoping. Even one bug here is reportable. Defense: namespace at the database level, not just in queries.
- − Storing PII you don't needAgent memory stores selectively, not exhaustively. Don't extract or persist sensitive details unless they're load-bearing for support quality.
- − Treating agent memory as a CRMThe CRM is the system of record for account data. Memory complements it; don't try to replace it. Sync read-only from CRM into memory at session start.
- − Letting old memories dominateA 6-month-old preference for one channel may be obsolete. Type-specific decay (preferences = 90 days) is non-optional in this domain.
Code sketch
// At session start
const memories = await recall.search({
query: customerMessage,
scope: { customer_id: ctx.customer.id }, // strict tenant isolation
types: ["fact", "preference", "event"],
limit: 25,
});
// After response
await recall.write({
scope: { customer_id: ctx.customer.id },
source: { turn: { user: customerMessage, assistant: response } },
// Pre-filter rejects pleasantries; extraction picks out memorable content
});
Go deeper
Build this with Recall
Recall is open source and ships with the architecture above out of the box.