Memory for Personal Assistants

By Arc Labs ResearchMay 2, 20269 min read

The problem

"How was your week?" said by a stranger and "How was your week?" said by a friend land differently because the friend remembers what last week was. Personal assistants are useful proportional to how friend-shaped they feel — which is proportional to how much they remember and surface naturally.

But personal assistants also live inside a privacy frame. The user has to trust that memory respects deletion requests, doesn't surface things in inappropriate contexts, and stays scoped to the user themselves.

What agent memory gives you

The five canonical memory types map cleanly onto personal-assistant patterns:

Facts — "user lives in Berlin", "user works at Volkswagen". Slow- changing identity context.
Preferences — communication style, dietary restrictions, dark mode, calendar working hours. Mutable; supersede readily.
Events — meetings, milestones, conversations the user wants followed up on. Time-anchored.
Entities — people in the user's life (partner, manager, kids, kids, clients). Each carries aliases and relations.
Relations — typed edges between entities (Sarah is the user's manager; Alex is Sarah's partner; both work at Volkswagen).

The relation graph is where personal assistants distinguish themselves. "Did Sarah mention what Alex thinks about the move?" requires composing three memories the user never said in one sentence.

The entity graph is the personal assistant's core primitive

For personal assistants, the entity graph does the heavy lifting that semantic search alone cannot. When the user says "call Sarah", "message my sister", or "what was Alex thinking about the move?" — the query needs entity resolution before retrieval.

Entity resolution works in two stages: first, fuzzy name matching resolves "Sarah" to entity_id ent_sarah_chen_1, using aliases stored on the entity (e.g., ["Sarah", "Sarah C.", "my college roommate"]). Second, the entity-graph retriever walks the graph from that entity: Sarah → [facts about Sarah] → [events involving Sarah] → [relations: Sarah is the user's college roommate, works at Acme Corp, Sarah's partner is Alex].

The "Alex + the move" query: relation traversal identifies Alex from Sarah's partner relation, then retrieves events where Alex appears. This is a 2-hop graph query — impossible with pure semantic search on the user's history.

Graph investment pays off at the edges: who knows whom, who reported to whom, who influenced which decision. Invest early in entity resolution quality. Fragmented entities (Sarah Chen vs Sarah, College Friend vs Sarah C) kill multi-hop utility.

// Entity-aware query
const response = await recall.search({
  query: "what does Sarah think about the apartment move",
  scope: { user_id },
  hints: {
    entities: ["sarah"],  // resolver will match "sarah" → ent_sarah_chen_1
    hop: 2  // walk outward 2 hops from Sarah's entity node
  },
  types: ["event", "fact", "preference"],
  limit: 20,
});

The entity resolver uses a combination of exact-match, normalized-match (lowercase, stripped punctuation), alias lookup, and fuzzy string similarity. Aliases are stored on the entity node and updated whenever the user refers to that person using a new form of their name. Once the resolver maps the query mention to an entity ID, the graph retriever can begin walking — and the quality of every multi-hop query depends entirely on how complete the alias table is.

Invest in alias bootstrapping from day one. If the user's contacts are connected (calendar, phone), pre-populate aliases from contact display names and nicknames. If not, build aliases incrementally as the user mentions people by different names across sessions.

Write pipeline behavior for personal assistants

The write pipeline's 7 stages behave differently in the personal domain.

pre_filter — configured strictly. Reject: motivational quotes the user shared, jokes, trivia, task confirmations ("ok done"), and ambient weather observations. Target relevance_threshold: 0.55. Personal assistant junk is often sentimental noise — high frequency, low memory value. A message like "ha that's funny" contains no storable signal. Passing it downstream wastes extract and resolve cycles.

extract — should create:

Scheduling events ("lunch with Sarah Thursday"): type event, entities [Sarah], temporal {valid_from: Thursday's date}
Preference signals ("I prefer morning meetings"): type preference, content "prefers morning meetings", confidence 0.65
Relationship facts ("Alex is Sarah's partner"): type relation, subject_entity [alex], predicate "partner_of", object_entity [sarah]
Biographical facts ("just moved to Berlin"): type fact, half-life 365d

Inferences should be preferences or events, not facts. "User seemed stressed" — if you store it at all, store it as a preference candidate with confidence 0.20. Facts claim ground truth; inferences do not have it. Storing inferences as facts is the failure mode that makes memory feel invasive rather than helpful. The user says something once in a low mood; the agent surfaces "you tend to get stressed" three months later. That is the trust-ending mistake.

resolve_refs — this is where "my sister", "mom", "the client" get resolved to entity IDs. Build robust alias tables. "My boss", "my manager", and a person's first name all need to resolve to the same entity. The resolve stage should log every unresolved reference — unresolved refs are gaps in the graph that accumulate silently and degrade multi-hop quality over time. Treat unresolved refs as a quality signal: if > 15% of entity mentions are failing to resolve, the alias tables need enrichment.

infer — for personal assistants, the infer stage is where preference signals emerge from behavioral patterns. If the user reschedules every Monday morning meeting, the infer stage can propose a preference candidate: "user prefers not to have Monday morning commitments". Confidence should start low (0.35) and increase only as the pattern repeats. Auto-promote to confirmed preference at confidence ≥ 0.75 after at least 4 occurrences.

Privacy and context-aware retrieval gating

The hardest design question for personal assistants: when is it appropriate to surface a memory? The query context matters as much as the query content.

A user in a work conversation asking "how do I structure this proposal?" should not receive memories about personal stress or relationship troubles — even if those memories are semantically adjacent to "proposal" (e.g., "Sarah's proposal to get married last month").

Two-layer gating:

Topic classification at retrieval time: classify the query into domains (professional, personal, health, relationship) and set a retrieval scope. A professional query only retrieves professionally-tagged memories by default. The domain classifier runs before the retriever and narrows the filter set passed to the retrieval stack. Misclassification is possible — build an escape hatch where the user can explicitly request cross-domain retrieval ("remind me everything about the Berlin trip including personal stuff").

Confidence thresholding for sensitive memories: memories tagged sensitive (health, relationship, emotional) require confidence ≥ 0.80 before surfacing. A low-confidence inference about mental state has no place in a work context.

const domain = classifyQuery(userMessage); // "professional" | "personal" | "health" | ...

const allowedDomains: Record<string, string[]> = {
  professional: ["professional", "scheduling"],
  personal: ["personal", "professional", "scheduling"],
  health: ["health"],  // health queries only retrieve health-tagged memories
};

const memories = await recall.search({
  query: userMessage,
  scope: { user_id },
  filters: {
    domains: allowedDomains[domain],
    min_confidence: domain === "professional" ? 0.60 : 0.40,
  },
  limit: 20,
});

Domain tagging happens at write time. The extract stage assigns a domain tag to each candidate memory. This is a secondary classification pass — after the memory type is determined (fact, preference, event), the domain tagger labels it professional, personal, health, or relationship. The tagging model can be small; it only needs to separate broad domains, not perform fine-grained topic classification.

Soft-delete (supersession) handles most forgetting: a preference changes, an entity updates, an event becomes irrelevant. But users have the right to hard-delete, and personal assistants must honor it.

"Forget everything about Sarah" should:

Hard-delete all memories with entity reference to Sarah's entity ID
Hard-delete all memories tagged with sessions where Sarah was mentioned
Remove Sarah's entity node from the graph (or tombstone it if there are orphaned references)
Record the deletion request in the audit log (who requested, what scope, when) — without recording what was deleted

// Hard-delete all memories referencing entity
await recall.hardDelete({
  scope: { user_id },
  filter: {
    entity_refs: ["ent_sarah_chen_1"],
    hard: true  // bypass soft-delete pathway
  },
  audit_reason: "User requested: 'forget everything about Sarah'",
});

Audit log retention: log the deletion event for 90 days (compliance), but log only the metadata (entity ID, timestamp, user ID), not the content of what was deleted. This satisfies GDPR audit requirements without reconstructing deleted data.

Tombstoned entity nodes are necessary when other entities hold relations pointing to the deleted entity. Without a tombstone, those relation edges become dangling pointers. A tombstone carries no content — only the entity ID and a deleted_at timestamp — but it prevents graph corruption. Graph traversal skips tombstoned nodes.

Session-scoped deletion is the second common request: "forget our conversation from today." Session tagging must be applied at write time — every memory extracted from a given conversation must carry the session ID. Without it, session-scoped deletion requires scanning every memory for temporal proximity, which is imprecise.

Freshness and forgetting for personal assistants

The canonical decay schedule from the Recall memory spec:

Preference: 90-day half-life (communication style, food preferences)
Event: 30-day half-life (meetings, appointments)
Fact: 180-day half-life (job title, location)
Entity: 365-day half-life (people persist longer)
Relation: 180-day half-life (job relations can change faster than personal ones)

For personal assistants, apply domain-specific acceleration:

Relationship events (dinners, calls, meetings with a specific person): decay accelerates after 6 months of no mention of that person. If Sarah hasn't appeared in any conversation for 180 days, events tagged with her entity node decay at 2x the standard rate.
Health-related facts: 90-day half-life (more aggressive than the standard 180d for facts). Medical situations, medications, appointments — these change more frequently than biographical facts.
Professional facts (job, company): 365-day standard — jobs don't change often but do change.

Background drift detection: if an entity's embedding centroid has moved more than 0.4 cosine distance since the entity was created AND their relation set Jaccard similarity has dropped below 0.5 — the entity has materially changed (new job, new city, relationship change). When drift is detected, flag the entity for user confirmation. The agent can surface this naturally: "I notice I haven't heard you mention Alex in a while and some things seem different — want me to update what I know about them?"

Half-life is not hard expiration. A memory at half its original confidence score is still retrievable; it simply ranks lower in retrieval unless the query closely matches it. Full expiration (score below the retrieval floor) causes the memory to be excluded from results but not hard-deleted from storage. Deletion on expiration is a separate policy decision — most personal assistants retain expired memories for 30 days post-expiration before purging, giving the user a recovery window.

Measuring personalization quality

The key quality metric for personal assistants: entity coverage — what fraction of people the user regularly mentions have entity nodes with more than 3 memories attached? If coverage is low, the graph quality will be low and multi-hop queries will fail silently.

"Regularly mentioned" is defined as appearing in at least 3 distinct sessions over the past 90 days. An entity with a node but fewer than 3 memories is weakly represented — enough to confirm the entity exists, not enough to answer questions about them.

Track weekly:

Entity nodes created this week: should grow proportional to new relationships entering the user's life
Average memories per entity node: target greater than 5 for regularly-mentioned people; below 3 means the entity is underrepresented
Hard-delete requests: high frequency signals privacy concerns; investigate whether the agent is surfacing memories in inappropriate contexts
Preference supersession rate: should be below 10% per week; higher means preferences are noisy or the extract stage is over-generating

A low preference supersession rate does not mean preferences are stable — it means the agent is not observing enough new preference signals to update them. If supersession rate drops to near zero, check whether the pre_filter is being too aggressive and blocking valid preference signals.

Entity coverage degrades when entity resolution fails. Monitor the unresolved entity reference rate (entity mentions that failed to map to an entity ID) separately from the entity coverage metric. An unresolved reference is a missed opportunity to attach a memory to an entity node. If unresolved rate is above 10%, the alias tables or the resolver model need attention.

Example flow

1
User: 'Schedule lunch with Sarah next week'
Resolution stage maps 'Sarah' to entity ID via fuzzy match against the user's known entities.
2
Memory pull on Sarah
Entity-graph retriever pulls Sarah's preferences (vegetarian), recent context (just had a baby), and last interaction (3 weeks ago).
3
Calendar check via tool
Agent checks calendar; finds Tue/Thu free. Cross-references Sarah's known availability if connected.
4
Suggestion synthesizes memory + tools
'How about Thursday at the Italian place near her office? They have good vegetarian options. You haven't seen each other since the baby — want me to ask if she has time?'
5
User confirms; new memories form
Event memory: lunch with Sarah on date X. Preference memory: user likes proactive scheduling suggestions.
6
User says 'forget that conversation'
Hard-delete pathway. Memories tagged with the relevant session are removed; not just superseded. Audit log records the deletion request.

Patterns that work

+ Single-tenant by construction
One memory store per user. No multi-tenancy concerns at the product level — the user's data lives in their store.
+ Strong entity graph
Personal assistants live and die on entity reasoning. Invest disproportionately in entity resolution and graph quality.
+ Calendar/email as memory inputs
Email subject lines and calendar event titles are dense memory sources. Run them through the same write pipeline; pre-filter aggressively.
+ Hard-delete on request
GDPR-grade deletion. 'Forget X' should remove memories from indexes, not just mark them inactive. Audit who-deleted-what.

Pitfalls to avoid

− Surfacing memory in the wrong context
If the user is in a work conversation, surfacing 'remember when you were going through a tough time' is a trust-ending mistake. Context-aware retrieval gating matters.
− Storing inferences as facts
'User seems stressed' is an inference, not a fact. Storing as fact is bad — it's brittle and creepy. Preferences and events are safer types for inferences.
− No deletion path
Soft-delete (supersession) is fine for most cases, but the user must have a hard-delete path. Without it, your privacy story doesn't hold.
− Failing to forget after relationship change
When entities exit the user's life, memories about them can become hurtful surface area. Decay should accelerate when an entity hasn't been mentioned in 6 months.

Code sketch

// User says: "What did Sarah think of the proposal?"
const memories = await recall.search({
  query: "Sarah proposal opinion",
  scope: { user_id },
  // Resolve "Sarah" → entity, then graph-walk
  hints: { entities: ["sarah"], hop: 1 },
  types: ["event", "fact", "preference"],
});

// Hard-delete on request
await recall.deleteByQuery({
  scope: { user_id },
  filter: { source_session: sessionId, hard: true },
});

Go deeper

Build this with Recall

Recall is open source and ships with the architecture above out of the box.

Quickstart →Docs