Entity Resolution: From Pronouns to Identity

By Arc Labs Research21 min read

"She said yes." Yes to what — and which she? Without resolution, that memory is lost the moment its conversational context falls out of the window. The job of entity resolution is to turn references like she, my boss, or that thing we discussed into stable identifiers that survive across sessions.

Resolution cascade · four stages, cheap-to-expensive
Try a phrase. Resolution stops at the first stage that succeeds.

The four-stage cascade

  • Pronoun rules. Free. Match he/she/it/they/we/I against the recent subject stack from the conversation. Resolves about 40% of references.
  • Grammar parser. Cheap. Noun-phrase chunking against the entity store with exact-match lookup. "The platform team", "the quarterly report" — definite articles signal known entities.
  • Fuzzy match. Trigram similarity against entity aliases. "VW" → Volkswagen, "the Berlin office" → Berlin office. Tunable threshold (typically 0.85 trigram Jaccard).
  • LLM judge. Last resort. Small classifier picks among top-k candidates. Used for genuinely ambiguous references; fires on roughly 10% of resolution attempts.

Hit rates and cost

On typical conversational data, the cumulative resolution rate after each stage looks like:

  • After pronouns: ~42%.
  • After grammar: ~73%.
  • After fuzzy: ~91%.
  • After LLM judge: ~99% (the remaining 1% are stored unresolved with a "needs review" flag). Total resolution cost for 1M turns/day at typical conversational density: under $20/day, dominated by the LLM judge stage which sees only the borderline cases.

The recent-subject stack

Pronoun resolution depends on knowing the conversational antecedent. Maintain a small ring buffer of the last N proper nouns and their referent IDs. When a pronoun appears, the algorithm walks the stack in reverse, applying agreement rules (gender, number) and decay weighting (more recent referents win).

When to create an entity

Every resolution attempt either binds to an existing entity or creates a new one. Creation thresholds matter — too lax and the entity store fragments (User has Sarah, Sarah Smith, S.S., and "my manager" as four separate identities). Too strict and disambiguation fails. A practical rule:

  • Match score > 0.92 → bind to existing entity.
  • Match score 0.75–0.92 → LLM judge picks bind-or-create.
  • Match score < 0.75 → create new entity.

Merging after the fact

Even with careful resolution, entities will fragment. Run a periodic consolidation job (see background worker jobs) that proposes entity merges based on overlapping relations and aliases. Merges are reversible; bind-or-create decisions at write-time are not.

Why this enables the entity graph

Resolution is what turns memory from a flat soup into a graph. Once references are stable identifiers, the relations between memories become a queryable structure — see entity graphs for multi-hop reasoning .

UUID v5 determinism explained

The resolution algorithm assigns entity IDs deterministically using UUID v5. The implementation for named entities is:

EntityId = UUID::new_v5(NAMESPACE_OID, canonical_name.to_lowercase().as_bytes())

UUID v5 is a name-based UUID format: it hashes the input bytes (using SHA-1) against a fixed namespace UUID to produce a deterministic 128-bit identifier. The same input always produces the same output. "Priya Sharma" lowercased and hashed against NAMESPACE_OID will produce the same UUID today, next month, and after a server restart. There is no random seed, no auto-increment, no database sequence.

This determinism has three practical consequences:

No ID allocation step. When the pipeline resolves "Priya Sharma" in a new turn, it does not need to query the database to ask "does Priya Sharma have an entity ID yet?" It computes the UUID directly and either finds or creates the entity record at that ID. The entity lookup is a point read (by primary key), not a search. Under heavy write load, this eliminates a class of lock contention that would arise from a centralized ID counter or sequence.

Idempotent resolution. If the same turn is processed twice (worker retry after failure), the entity IDs produced are identical. The memory records written in the retry are structurally the same as those written in the first attempt. The dedup pipeline catches the duplicate memories; the entity records are harmless upserts. Without deterministic IDs, a retry could create a second entity record for the same name, fragmeting the entity graph.

Cross-session consistency without state. "Priya Sharma" mentioned in a January session and "Priya Sharma" mentioned in a September session resolve to the same EntityId without the pipeline needing to carry state between sessions. The entity record persists in the database; the UUID computation is stateless. If the pipeline is completely restarted between sessions, entity identity is preserved through the hash function.

The first-person path uses a different input:

user_entity_id = UUID::new_v5(NAMESPACE_OID, scope.user_id.as_bytes())

First-person references ("I", "me", "my") resolve directly to the authenticated user's scope UUID. This requires no LLM call and no entity table lookup. The pipeline knows the user's scope at all times; first-person resolution is a free operation.

The pronoun resolution algorithm in depth

The recent-subject stack is a ring buffer that tracks the last N proper nouns encountered in the conversation, along with their resolved EntityIds and grammatical metadata. Each entry records the entity's name, ID, gender annotation (if available), animacy (person vs. non-person), and the turn index at which it was mentioned.

When a pronoun appears, the resolution algorithm walks the stack in reverse (most recent first) and applies agreement filters:

PronounAgreement rules
"I", "me", "my"Resolve to scope.user_id's entity. No stack lookup needed.
"she", "her"Most recent feminine-annotated entity in last N turns
"he", "him", "his"Most recent masculine-annotated entity in last N turns
"they", "their", "them"Most recent plural or non-binary entity
"it"Most recent non-person entity
"we", "our"Most recent organization or group entity

Decay weighting means that recency is not binary. The stack entry from 2 turns ago scores higher than one from 8 turns ago, even if both agree grammatically. The weight function is approximately exponential with each turn:

weight(entry) = exp(-0.3 × (current_turn - entry.turn_index))

With this weighting, an entity mentioned 1 turn ago has weight 0.74; one mentioned 5 turns ago has weight 0.22; one mentioned 10 turns ago has weight 0.05. In practice this means that if Priya was mentioned 2 turns ago and Ana was mentioned 6 turns ago, both feminine, a "she" pronoun will resolve to Priya with high confidence (weight ratio ~3.4:1). The LLM judge is not called.

The hard edge case: two recently-mentioned entities with equal recency and the same grammatical gender. A conversation where both Priya and Ana were mentioned in the immediately preceding turn produces a tie on the recency weighting. The algorithm detects this tie (weight ratio within 10% of each other) and escalates to the LLM judge. The judge receives the full conversational context, not just the pronoun and the two candidates, and produces a structured output:

{ "entity_id": "ent_priya_ABC", "confidence": 0.85 }

The confidence threshold for pronoun resolution at Tier 4 is 0.65 — lower than the merge confidence threshold in dedup (0.75), because the cost of deferring (storing the pronoun unresolved) is higher than the cost of an occasionally wrong resolution that background consolidation can later correct. At confidence below 0.65, the pronoun resolves via UUID v5 of the pronoun string itself ("she" → deterministic UUID), producing a synthetic entity ID that background consolidation can later merge with the correct entity.

The alias normalization trigger

The optional alias normalization LLM call resolves the specific problem of named entity variants: different strings that refer to the same entity. "Volkswagen" and "VW" are the same company. "Inbox3 project" and "Inbox3" are the same thing. "Priya" and "Priya Sharma" might be the same person.

The call is not triggered for every named entity — it only fires when a specific structural condition is met:

The trigger: two or more non-pronoun names in the current batch share at least one lowercase token, or one name is a substring of another.

Walk through the trigger logic with examples:

  • "Volkswagen AG" and "Volkswagen": share the token "volkswagen" → trigger alias normalization. The normalizer confirms they refer to the same entity and adds "Volkswagen AG" as an alias for the canonical "Volkswagen" entity.

  • "VW" and "Volkswagen": share no tokens ("vw" ≠ "volkswagen") and neither is a substring of the other → no trigger. These resolve independently via the alias table (which already maps "VW" → Volkswagen from a previous normalization run), not via the LLM call.

  • "Inbox3 project" and "Inbox3 company": share the token "inbox3" → trigger. The normalizer evaluates whether these are the same entity or two related but distinct ones. If they are the same, it returns a canonical name; if distinct, it produces two separate EntityIds.

  • "Priya" and "Priya Sharma": "priya" is a substring of "priya sharma" → trigger. The normalizer checks whether this is the same person (common case: yes, given shared conversational context) or two different people named Priya.

The trigger condition avoids unnecessary LLM calls for clearly distinct names. "Volkswagen" and "Stripe" share no tokens and have no substring relationship — no trigger, both resolve directly. At typical conversational density, alias normalization fires on roughly 20% of turns that contain named entities. Of those, ~70% result in a confirmed alias mapping. The remaining ~30% are cases where the shared token was coincidental (two different companies with "tech" in the name that do not alias to each other).

The two LLM calls — pronoun coreference and alias normalization — run concurrently via tokio::join!. If a turn needs both (pronouns and ambiguous named entity variants), both requests fire simultaneously and results are available when both complete. The total latency is max(pronoun_latency, alias_latency), not the sum.

Fails-open semantics

Both LLM calls are non-fatal. The resolution pipeline degrades gracefully when they fail:

Pronoun coreference failure: UUID v5 of the pronoun string. "She" hashed against NAMESPACE_OID produces a deterministic synthetic UUID. Every occurrence of an unresolved "she" in the same user scope maps to the same synthetic ID. This has a useful property: if the user consistently uses "she" to refer to the same person across sessions, all those memories accumulate under the same synthetic ID even without successful LLM resolution. Background consolidation can later merge the synthetic entity with the correct named entity when enough evidence accumulates.

Alias normalization failure: Each name resolves via the alias table (fast path) or direct UUID v5 hash (fallback). Two names that should have been aliased now have different EntityIds. This produces a fragmented entity graph — two nodes that should be one. Background consolidation detects this via overlapping relations and alias overlap, and proposes a merge.

Fails-open is the correct design choice for two reasons. First, a dropped memory is unrecoverable. If the pipeline errors out and discards the candidate rather than writing it with a possibly-wrong entity ID, the information is gone forever. Writing with a wrong ID is correctable. Second, the failure modes are self-healing: deterministic UUID fallbacks and background consolidation together mean that wrong-ID memories are rare (since UUID v5 handles many cases correctly) and fixable when they occur.

The only case where the pipeline explicitly does not write a memory is when the source span does not support the claim (failed grounding check in the confidence pipeline). That discard is intentional and correct regardless of resolution status.

Entity creation and the stub lifecycle

A newly created entity starts as a stub: an entity record with a UUID, a canonical name, a type annotation, and minimal metadata. It has no aliases, no attributes beyond what can be inferred from the creation context, and no relations. It exists in the entity table to anchor future memory writes.

The stub lifecycle for a person entity named "Priya" over 10 conversations:

Conversation 1 (creation event): Priya is mentioned as "my colleague Priya." No existing entity. Match score below 0.75 threshold → create new entity. Stub created: {name: "Priya", type: Person, source: "my colleague"}. One memory written: {subject: Priya, predicate: colleague_of, object: User}.

Conversations 2–3: More Priya mentions. UUID v5 of "priya" matches the existing stub. Memories accumulate: employer, location, project involvement. The entity record gains relation edges. Not yet a full profile, but retrievable.

Conversation 4: User says "Priya Sharma from the Delhi office." Alias normalization trigger fires (substring match between "Priya" and "Priya Sharma"). LLM confirms same entity. "Priya Sharma" added to alias list. UUID v5 of "priya sharma" now maps to the same EntityId as "priya" via the alias table (no longer via direct hash).

Conversations 5–7: Location (Delhi), employer, team membership accumulate as relation edges. The entity node now has enough data to support multi-hop queries: "Who does Priya work with?" traverses the colleague_of relation. "What's Priya's employer?" traverses the works_at relation.

Conversations 8–10: Temporal bounds emerge. An Event memory records a project Priya was involved in with start and end dates. The entity now has a temporal profile. Decay affects the time-bounded memories but not the structural facts (name, employer, location).

At this point the stub has become a full entity profile. The lifecycle from stub to profile is driven by memory accumulation — no explicit enrichment step required. The entity graph builds itself as memories are written.

Resolution and the alias table

The alias table is the fast path for named entity variants. Every entity record has a list of known alternative names. Before the fuzzy match stage runs, the pipeline checks whether the canonical form of the reference exists in the alias table. If it does, the entity resolves immediately — no trigram computation, no LLM call.

Aliases accumulate through two mechanisms:

Alias normalization confirmation. When the alias normalization LLM call confirms that "VW" refers to Volkswagen, the pipeline writes {entity_id: ent_volkswagen, alias: "VW"} to the alias table. Future "VW" references resolve via the alias table in the grammar parse stage (Stage 2), never reaching fuzzy match or LLM judge.

Background consolidation. When the weekly consolidation job merges two entity records that were incorrectly fragmented, it adds all of one record's names to the other's alias list. If "VW AG" had a separate entity record that gets merged into Volkswagen, "VW AG" becomes an alias and all memories previously under the VW AG entity ID are repointed to the Volkswagen entity.

The alias table effectively memoizes the output of expensive resolution steps. An alias that required an LLM call to establish the first time never requires one again. Over time, the alias table grows to cover the common variant forms in your user's vocabulary, and the LLM judge stage fires less frequently as the table absorbs more of the ambiguous cases.

This has a compounding effect on resolution cost. In the first week of operation, the alias table is sparse and LLM judge fires on ~10% of references. After 90 days with an active user, the alias table covers the entity vocabulary and LLM judge fires on ~2% of references. The $20/day cost estimate above is based on a warm alias table; cold-start costs are higher by a factor of 3–5x.

Background entity consolidation

Write-time resolution operates per-turn with per-turn context. It has no global view of the entity store. Background consolidation is the complementary process that operates with full visibility across all entities and all memories.

The consolidation job runs on a configurable schedule (typically weekly for moderate-activity users, nightly for high-activity ones). It identifies candidate entity merges using three signals:

Overlapping aliases. If entity A has alias "Volkswagen" and entity B has alias "VW", and the alias normalization table says VW = Volkswagen, A and B are merge candidates. This catches fragmentation from resolution failures.

Shared relation patterns. If entity A and entity B both have works_at: Acme and location: Berlin and colleague_of: User, they are likely the same person referred to by two different names. Relation overlap scoring (fraction of relations in common / total relations) above 0.7 triggers a merge proposal.

Attribute overlap. Entity records accumulate attribute memories over time. If two entity records have overlapping employer, location, and project attributes with high confidence, background consolidation proposes a merge.

Merge proposals go through a two-step confirmation: first an automated LLM judge review (does the evidence support that these are the same entity?), then either automatic merge (if confidence > 0.90) or human review queue (if 0.75 ≤ confidence ≤ 0.90).

Merges executed by background consolidation are reversible via an audit log. Each merge records which entity IDs were combined, which was designated canonical, and what aliases were transferred. An incorrect merge can be undone by splitting the merged entity and redistributing memories by provenance.

When to create a new entity vs. bind to existing

The bind-or-create decision at write time uses the three-tier threshold (match score > 0.92 → bind, 0.75–0.92 → LLM judge, < 0.75 → create). The threshold values are calibrated for precision — a wrong bind is harder to recover from than a spurious creation. Spurious creations are detected and corrected by background consolidation; wrong binds corrupt the entity's memory profile and are harder to untangle.

Three worked examples at different confidence levels:

High-confidence bind: The user mentions "Priya" and there is one existing entity named "Priya" in the user's scope with 15 memories attached. Fuzzy match score: 0.98. Bind directly. No LLM call. The probability that this is a different Priya is low enough (single-entity scope, high match score) that the cost of an LLM call is not justified.

LLM-judged bind-or-create: The user says "Priya from Delhi" and there is an existing entity named "Priya" with location=Mumbai. Match score on the name alone is 0.92 (border of the bind zone), but the location attribute contradicts. The pipeline escalates to the LLM judge, which receives: existing entity profile (name: Priya, location: Mumbai, employer: ...) and the new reference ("Priya from Delhi"). The judge returns {"action": "create", "confidence": 0.82, "reason": "Location conflict — likely different person"}. A new entity is created for Delhi-Priya. The two Priyas coexist in the entity store as separate nodes until either the user clarifies or background consolidation accumulates enough evidence to confirm they are distinct.

Automatic creation: The user says "Dr. Krishnamurthy at Apollo Hospital" and there is no entity with that name, no aliases matching "krishnamurthy", and no fuzzy match above 0.75. Create a new entity stub: {name: "Dr. Krishnamurthy", type: Person, attributes: {affiliation: "Apollo Hospital", title: "Dr."}}. UUID v5 of "dr. krishnamurthy" is the EntityId. Future mentions of "Krishnamurthy" will match via the fuzzy stage (trigram similarity between "krishnamurthy" and "dr. krishnamurthy" is above 0.85), route to the grammar/alias stage, and bind to this entity.

Coreference cache TTL

Resolved coreferences are cached with a 30-day TTL. Within the TTL window, a previously resolved pronoun or reference from the same entity resolves immediately from cache without any LLM call. After 30 days without re-validation, the cached resolution expires and the next reference triggers a fresh resolution pass.

The 30-day TTL is calibrated against the rate at which entity contexts change. An entity's role, location, or relationship to the user might shift over a 30-day window — a colleague becomes a manager, a business contact becomes a friend, a project lead changes. Stale coreference resolutions that persist past a context change produce memories with correct entity IDs but incorrect relationship framing. The TTL forces periodic re-resolution that can incorporate the changed context.

For stable entities (close family members, the user themselves, long-term employers), 30-day TTL re-validation rarely produces a different result. The cost is one LLM call per entity per 30 days — negligible. For volatile entities (project collaborators, acquaintances, organizations in flux), the TTL ensures the resolution reflects current context.

The cache is keyed by (entity_id, scope). A cache miss triggers the full four-stage resolution cascade. The result, including any new aliases discovered, is written back to the cache and alias table. Background consolidation uses cache miss events as a signal: entities with frequent cache misses are likely changing, and their memories should be prioritized for drift monitoring.

Related reading

Updates from the lab.

Engineering notes, research drops, occasional product updates. Roughly monthly.