Pre-Filter: Rejection Before Storage
The pre-filter is the first stage of the write pipeline and the only one that runs on every turn. It is also the cheapest, by orders of magnitude — a regex match costs microseconds, an LLM extraction call costs cents. Anything we can drop here we never have to pay to extract.
The six patterns
These are illustrative — your patterns will reflect your domain. The point is that you only need a small set, not a comprehensive language model.
- Length < 4 tokens. Acknowledgements, one-word replies, "ok thanks". Almost never carry persistent signal.
- Greetings and pleasantries. "Hey there", "good morning", "how's it going". Match an inclusion list of ~50 phrases — covers 95% of real instances.
- Pure acknowledgement. "Got it", "understood", "makes sense". Detect with a small phrase list plus an optional emoji.
- Meta-talk about the conversation. "Let me re-read that", "going back to your earlier point". Useful in context but not as durable memory.
- UI commands without content. "Open settings", "go back", "show me the dashboard". The action matters; the words rarely do.
- Code-only blocks. Triple-fenced code with no prose around it. Extracts poorly into facts; rebuild from artifacts if needed.
Order rules cheap-to-expensive
Even within a stage, ordering matters. Run length checks before regex; run regex before embedding-based similarity to historical "junk" exemplars (if you maintain one). The first match wins and short-circuits the rest.
When to extract anyway
Some pre-filter matches are still worth extracting — usually because they appear next to substantive content. Treat the pre-filter as a candidate suppressor, not a turn suppressor: a turn that contains both an acknowledgement and a fact still goes to extraction. Sentence-level segmentation before pattern matching solves this cleanly.
Measuring
Track per-rule rejection rate over time. A rule that suddenly stops matching usually means the upstream agent changed style. A rule that suddenly matches more usually means a new conversational scenario you have not yet thought about.
prefilter.reject_rate— overall and per-rule.prefilter.precision— sample 1% to LLM judge; what fraction of rejections were correct?prefilter.escape_rate— what fraction of survivors get rejected at later stages? Indicates pre-filter coverage gaps.
The cost arithmetic
At 1M turns/day with a 25/day, or about 59,000/year on extraction alone — before counting embedding costs, dedup costs, or storage costs.
The four internal operations
The six-pattern description above is a conceptual shorthand. The actual pre-filter implementation is four sequential operations, each with distinct behavior:
1. Word count check.
The first operation counts whitespace-delimited tokens in the turn text.
If the count is below min_words (default: 3), the turn is rejected
immediately with SkipReason::TooShort { word_count }. This catches
single-word replies ("ok", "yes", "sure"), two-word replies ("got it",
"makes sense"), and turns that are entirely whitespace or punctuation.
The word count check runs before regex compilation and is the cheapest
possible check — a single pass over the string with no allocations.
2. Pattern match. Compiled regex patterns are evaluated in order. The first match wins and short-circuits the rest. The default patterns cover:
// Greetings and acknowledgments
r"(?i)^(hi|hello|hey|thanks?|thank you|ok|okay|got it|sounds good|cool|nice)[\s\.\!\?]*$"
// Meta requests under 20 characters
r"(?i)^(can you|could you|please|tell me)" // .at_most_length(20)
// Pure emoji or reactions
r"^[\p{Emoji}\s]+$"
// Tool call delimiters
r"^<tool_call>|^<tool_result>"Patterns are compiled once at startup and stored as a pattern set.
Matching a 50-character turn against a compiled pattern set takes
roughly 2–5 microseconds. This is four orders of magnitude cheaper than
an LLM call. The MatchedSkipPattern reason records which pattern
matched, so rejection logs are queryable by rule.
3. Rate limit gate.
The rate gate checks whether identical content from the same user_id
has been seen within the rate_limit_window_secs window. The internal
map stores (user_id, content_hash) → last_seen_at and is bounded at
10,000 entries. When the map is full and a new entry would be added,
the oldest entry is evicted (LRU). The bound prevents unbounded memory
growth in high-cardinality environments.
4. Role gate.
The final check examines the role field on the turn. Assistant turns
are rejected by default with SkipReason::AssistantTurn. This is
controlled by the extract_from_assistant configuration flag, which
defaults to false.
The four operations always run in this order: word count → pattern match → rate limit → role gate. The ordering is not arbitrary. Word count is cheapest and eliminates the most turns. Pattern match is next cheapest and eliminates the most remaining turns. Rate limit requires a map lookup and so runs after the pattern pass has already dropped the bulk. Role gate runs last because it requires reading the role field, which some turn representations store separately from the content.
The typed rejection reason for the stage is:
enum PrefilterDecision {
Skip(SkipReason),
Pass,
}
enum SkipReason {
TooShort { word_count: usize },
MatchedSkipPattern,
UserRule(String),
AssistantTurn,
}Every turn that exits Stage 1 carries one of these decisions. Pass decisions carry no extra data. Skip decisions carry the reason, which is written to the stage span for that turn.
Worked examples
The following table shows concrete turns and their pre-filter outcomes:
| Turn | Decision | Reason |
|---|---|---|
| "hi" | Skip | TooShort (1 word) |
| "ok thanks 👍" | Skip | MatchedSkipPattern |
| "I just finished the Arrive interview. It went well." | Pass | — |
| "🎉🎉🎉" | Skip | MatchedSkipPattern |
| "What's the weather?" | Skip | MatchedSkipPattern (meta request, 4 words, under 20 chars) |
| "Priya said the Inbox3 deadline is Friday" | Pass | — |
| "sounds good" | Skip | MatchedSkipPattern |
| "Can you summarize that?" | Skip | MatchedSkipPattern (meta request) |
| "My manager moved our 1:1 to Thursday" | Pass | — |
| "sure" | Skip | TooShort (1 word) |
The pattern "What's the weather?" merits a note. It matches the meta request pattern because it is under 20 characters and begins with an interrogative form. In practice, this is the correct rejection: weather queries are ephemeral and carry no persistent signal. If a user said "It's been raining for three weeks and I hate it — I need to move", that turn would not match the meta request pattern (it exceeds 20 characters and does not begin with "can you / could you / please / tell me"), and it would reach extraction.
The rate gate in detail
The rate gate addresses two failure modes that pure pattern matching cannot handle.
Feedback loops. Some agents exchange the same greeting with the user at the start of every session: "Good morning! How can I help you today?" If the agent's greeting is not caught by the pattern set (maybe it is a custom greeting that does not match any default rule), the same text will arrive at pre-filter every day. Without the rate gate, that text would pass through to extraction, produce a no-candidate result (extraction should correctly discard it), and cost an LLM call per day. With the rate gate, after the first extraction, identical content from the same user is rejected at zero cost for the remainder of the window.
Duplicate turns from buggy clients. Some client implementations
retry on network timeout without checking whether the original request
succeeded. This produces duplicate turns — the same message delivered
two or three times within seconds. The rate gate with a short window
(default: 60 seconds) collapses these into a single pass. The first
delivery passes; subsequent deliveries are rejected with
MatchedSkipPattern. (Yes, the rate gate uses the same SkipReason;
in practice, the logged reason includes the matched-rate-limit detail
so dashboards can distinguish the two cases.)
The 10,000-entry bound on the rate map is a deliberate constraint, not an oversight. In a multi-tenant deployment where each user_id is distinct, 10,000 entries covers 10,000 users with one active content hash each. When the map fills, LRU eviction means the oldest (least recently active) entries are dropped first. A new high-frequency user will push out a low-frequency user's entry. In the worst case, this causes a single extra LLM call for the evicted user's next duplicate. That is acceptable. The alternative — an unbounded map — would grow to hundreds of megabytes in a large deployment and create GC pressure on the worker process.
Sentence-level segmentation
The "When to extract anyway" section above mentions sentence-level segmentation as the clean solution to the mixed-turn problem. Here is what that means in practice.
A conversational turn often contains both low-signal and high-signal content in the same message:
"Yeah, got it. By the way, my team is switching from Jira to Linear next month."
If the pre-filter is applied at the turn level, there are two choices: pass the whole turn (correct, but the "got it" noise goes to extraction) or reject the whole turn (wrong — the second sentence is worth storing). Neither is good.
Sentence-level segmentation splits the turn into sentences before applying the pre-filter. The implementation uses a lightweight rule-based sentence splitter (not an ML model — that would violate the cost ordering invariant). Each sentence is evaluated independently. Sentences that match a skip rule are dropped. Sentences that do not match are reassembled into a filtered turn and passed to extraction.
In the example above, "Yeah, got it." matches the greeting/acknowledgment pattern and is dropped. "By the way, my team is switching from Jira to Linear next month." does not match any pattern (it is 16 words, not a greeting, not a meta request) and is passed to extraction as a single-sentence turn. Extraction then runs on the substantive content only, without the noise.
The split-filter-reassemble path costs roughly 3–8 microseconds per
sentence for the regex step, making it negligible relative to the LLM
call it enables or prevents. Sentence boundaries are detected with a
simple rule: split on ., ?, ! followed by whitespace and an
uppercase letter, with exceptions for common abbreviations (Dr., Mr.,
U.S., etc.). This is not perfect — it misses some boundaries and creates
spurious splits on abbreviations outside the exception list — but it is
good enough for the purpose. A sentence that is split incorrectly either
passes the pre-filter (and costs one extra-cheap regex check) or fails
it (and drops a fragment that extraction would have rejected anyway).
Rejection with reasons
Every skip decision produces a structured SkipReason. This is not
just for debugging — it is the primary interface between the pre-filter
and the observability stack.
When a user reports "you didn't remember X", the investigation starts
with the turn's stage span. If the span shows SkipReason::TooShort
for a turn that the user expected to produce a memory, the diagnosis is
immediate: the turn was too short, and the min_words threshold needs
adjustment for this user's communication style. If the span shows
SkipReason::MatchedSkipPattern, the pattern name tells you exactly
which rule fired. If the span shows SkipReason::UserRule("custom_noise"),
you know a user-defined pattern caught it and can inspect that rule.
Without typed reasons, this investigation requires re-running the turn against the pattern set with instrumentation — which means reproducing the original state of the pattern set at the time of rejection, which may have been updated since. With typed reasons persisted in the span, the answer is in the log.
The stage span schema for a pre-filter rejection:
{
"stage": "pre_filter",
"latency_ms": 0.003,
"result": "reject",
"reason": {
"type": "MatchedSkipPattern",
"pattern": "greeting_ack"
}
}This span is written for every turn, pass or reject. The storage cost is minimal — a few dozen bytes per span. The operational value is disproportionate: every "why didn't the agent remember X" question is answerable by querying stage spans.
User-defined custom patterns
The pre-filter supports user-defined skip patterns via
WritePipelineConfig::user_skip_patterns. These are added to the
compiled pattern set alongside the defaults and evaluated in the same
pass. A UserRule(String) skip reason is produced when a custom
pattern matches, with the string being the pattern name.
When to add custom patterns:
Domain-specific junk phrases. If your agent operates in a support
context, the phrase "Is there anything else I can help with?" appears at
the end of almost every assistant turn. It is not caught by the default
greeting patterns (it is not a short greeting; it is a full sentence).
Adding a custom pattern for this exact phrase prevents it from reaching
extraction on every turn where extract_from_assistant is enabled.
Tool output formats. Agents that surface structured tool output
often produce turns like [function_result: success] or
{"status":"ok","code":200}. These are machine-readable artifacts, not
conversational turns with extractable facts. A pattern on the JSON
envelope or the function_result prefix catches them at zero cost.
Known noise patterns specific to your deployment. If you run a code
review agent, inline code references like diff --git a/src/main.rs b/src/main.rs
that did not get triple-fenced will slip through the default code-block
rule. A custom pattern on the diff header rejects them.
Testing custom patterns before deploying them matters. The recommended approach: run precision sampling on a 24-hour sample of turns, filter to turns that the candidate pattern would reject, and send them to an LLM judge with the question "Does this turn contain any information worth storing as a persistent memory?" A precision below 90% means the pattern is too aggressive. A precision above 99% with very low recall (the pattern matches almost nothing) means the pattern is fine but may not be worth the maintenance cost.
The role gate
The extract_from_assistant flag controls whether assistant turns are
passed to extraction. It defaults to false. When false, every turn
with role: "assistant" is rejected at Stage 1 with
SkipReason::AssistantTurn.
The default is false for a specific reason: assistant turns are generated, not stated. A user turn "I live in Berlin" is a first-person declaration of fact. An assistant turn "It sounds like you live in Berlin" is a restatement by the model, derived from the user's prior statement. Storing the assistant restatement adds a near-duplicate of the user's fact, increasing the likelihood of dedupe noise downstream. More importantly, storing generated text as if it were user-stated fact blurs provenance — if the agent later surfaces that memory, the source is a model output, not a user claim.
There is one class of agent where extract_from_assistant: true is
appropriate: self-reflective agents that produce first-person reasoning
about their own state. An agent that writes "I realize I've been giving
inconsistent answers about project status — I should update my working
model of the timeline" is recording its own cognition, not paraphrasing
user input. For these agents, assistant turns carry genuine epistemic
content worth persisting. Enable the flag for the assistant's
agent_id only, not globally.
When the role gate is enabled (i.e., extract_from_assistant: false,
the default), the assistant turn span still records a
SkipReason::AssistantTurn. This makes the gate auditable: you can
query how many assistant turns were seen, and what they contained, even
though they were not extracted. In a deployment where assistant turns
are unexpectedly high (the client is misconfiguring role labels, for
instance), the gate span count will spike before any downstream stage
shows an anomaly.
Why pattern-based, not ML
A reasonable question: why compile regex patterns rather than running a small classification model to decide whether a turn is worth extracting?
Four reasons:
Predictable. A regex pattern either matches or it does not. There is no probability threshold to tune, no distribution shift to monitor, no model version to pin. When a turn is rejected, the reason is the exact pattern that fired. A classification model that rejects a turn with 0.87 confidence tells you less.
Fast. A compiled pattern set evaluated against a 100-character string takes 2–10 microseconds. A small classification model (even a distilled 100M parameter model) takes 5–50ms on a CPU, and 0.5–5ms on a GPU endpoint with batching. At 1M turns/day, the difference between 5 microseconds and 5ms is 80 minutes of CPU time per day. More importantly, the model introduces a network round trip if hosted remotely, which adds latency variance.
Debuggable. When a pattern-based filter makes a mistake, you can read the pattern that caused it. When a neural classifier makes a mistake, reproducing the failure requires the exact model weights, the exact input tokenization, and ideally the attention weights to understand which features drove the decision. For an operation as consequential as "this memory was never extracted", debuggability is not optional.
Auditable. Regex patterns are code. They live in version control.
You can diff them, review them, and roll them back. A model checkpoint
is not auditable in the same sense — knowing that model v2.3 was
deployed on Tuesday tells you less than reading the commit that added
the pattern r"(?i)^(hi|hello|hey)[\s\.\!\?]*$".
The tradeoff is coverage: a pattern set will always have gaps that a classifier might close. A user who types "lol" instead of "ok" passes the default pattern set. So does "np" (no problem) if it is not in the phrase list. The answer is not to add a classifier — it is to accept that some junk turns will reach extraction, where the quality rules inside Stage 2 will reject them. Pre-filter does not need to be comprehensive; it needs to be fast, cheap, and correct on the cases it does cover.
Measuring pre-filter quality
The three metrics from the Measuring section above have specific operational interpretations worth expanding.
prefilter.reject_rate is tracked per rule and in aggregate. The
aggregate rate tells you whether the stage is doing its job. The
per-rule breakdown tells you which rules are carrying the load. In most
deployments, the word count check and the greeting/acknowledgment
pattern together account for 60–70% of all rejections. The remaining
rules pick up the long tail. A rule with a rejection rate under 0.1%
for 30 consecutive days is probably not covering a real pattern in your
deployment and should be removed. Dead rules add noise to the
per-rule dashboard without reducing cost.
prefilter.precision is the fraction of rejected turns that a
human or LLM judge would also reject. Sampling 1% of rejections for
judge evaluation takes approximately 10,000 LLM calls per 1M turns —
at 25/day for continuous precision
monitoring. Worth it. A precision drop below 90% means the filter is
rejecting turns that contain real information. The most common cause:
a pattern that was correct for one agent's style is catching different
content after an agent update. Identify the offending pattern by
breaking down precision per rule.
prefilter.escape_rate is the fraction of turns that passed
pre-filter but were rejected at Stage 2 (extraction produced no
candidates). A high escape rate (above 35%) means the pre-filter is
not covering patterns that extraction is catching, and you could
potentially add a pre-filter rule to drop those turns at zero LLM cost.
To identify candidate rules: cluster the escaped turns by content
pattern, look for recurring forms ("what do you think about X?",
"remind me to do Y", "schedule Z"), and evaluate whether those forms
should be added as custom patterns or handled by the extraction quality
rules.
A sudden change in any of these three metrics in the same direction almost always has the same root cause: the upstream agent changed its conversational style. New agent version, new system prompt, new conversational persona. The pre-filter pattern set is calibrated to a specific conversational style, and when that style drifts, the filter drifts with it — in whichever direction the new style differs from the old. The correct response is to sample recent rejections and recent passes, compare the distributions, and update the pattern set to match the new style.
Pre-filter vs extraction: the division of labor
Pre-filter rejects forms. Extraction rejects content. They are complementary, not redundant.
Pre-filter knows about the shape of a turn: is it short? does it look like a greeting? does it begin with a tool call marker? It does not know — and should not try to know — whether the content of a surviving turn is worth storing. That judgment requires understanding the semantics of the turn, which requires an LLM.
Extraction knows about the content of a surviving turn: is there a grounded fact here? is this transient? is this speculative? It does not know — and cannot cheaply know — whether the turn was a greeting, because by the time extraction runs, the greeting turns have already been dropped.
The failure mode of conflating these roles is over-engineering the pre-filter. Engineers who observe high extraction no-candidate rates sometimes try to move content filtering into pre-filter: "if the turn is a question, reject it". This works for some question forms ("what's the weather?") and fails for others ("who manages the project? I think it might be Priya"). A question is a form, but it is not a reliable proxy for low extractable content. The correct intervention is to tighten the extraction quality rules, not to add question-detection to the pre-filter.
Similarly, the failure mode of under-engineering the pre-filter is relying on extraction to do what pattern matching should do. If 30% of extraction calls are consuming LLM budget to reject "hi how are you" turns, the pre-filter is not doing its job. Adding the greeting pattern costs nothing and saves $750/day at 1M turns/day.
Keep the boundary clean. Pre-filter owns forms. Extraction owns content. Neither should encroach on the other.