// term 70 · Retrieval & Knowledge
Retrieval Pipeline
Information Retrieval Flow
The full sequence from user query to assembled context in a RAG system — query processing, embedding, search, reranking, and assembly, each stage with its own quality and latency stakes. The pipeline is the system: retrieval quality is the product of every stage, and the weakest one sets the ceiling.
// Structure
5–7 stages
Query processing through context assembly — each independently tunable, each independently capable of capping the whole.
// Math
multiplicative
Stage qualities compound — four stages at 90% yield 65% end-to-end. Excellence is required everywhere, not on average.
// Budget
~100–500ms
Typical end-to-end retrieval allowance inside an interactive AI request — spent jointly across every stage.
// full definition
What Retrieval Pipeline actually is
Between a user's question and the context a model answers from runs a pipeline — and that pipeline, not the model, determines most of what RAG systems get right or wrong. Query processing interprets and reformulates the question; embedding converts it to vectors; search retrieves candidates; reranking orders them by true relevance; assembly formats the winners into the prompt. Each stage is separately tunable, separately measurable, and separately capable of ruining everything downstream.
The compounding math is the pipeline's defining property. Quality multiplies across stages: a mediocre step doesn't average out — it caps the system. A brilliant reranker cannot rescue candidates the search never retrieved; perfect search cannot rescue a query embedded badly; and flawless retrieval dies in an assembly step that mangles formatting or buries the key passage mid-context. Pipeline thinking means finding the binding constraint and fixing it first — the weakest stage owns the ceiling.
Modern pipelines have grown sophisticated at both ends. Up front: query rewriting expands terse questions, decomposition splits multi-part asks, and hypothetical-answer embedding (HyDE-style) bridges the query-document vocabulary gap. At the back: cross-encoder reranking applies expensive precision to a cheap-retrieved shortlist, and assembly logic orders, deduplicates, and formats within token budgets. Hybrid search and metadata filtering thread through the middle. Each addition buys quality with latency — the end-to-end budget arbitrates.
Operationally, the pipeline's gift is localizable failure. Bad answers decompose into diagnosable stages: was the right content retrieved (recall)? ranked into the top results (precision)? assembled legibly into the prompt? Stage-level metrics — retrieval recall, reranker lift, end-task accuracy — turn debugging from vibes into engineering. Teams that instrument per-stage own their quality trajectory; teams that evaluate only end-to-end learn that something is wrong without ever learning what.
// how it works
Anatomy of a retrieval request
Every RAG answer begins with this pipeline — stages compounding multiplicatively, milliseconds budgeted jointly, failures localizable to specific steps.
Query Processing
The raw question is interpreted — rewritten for clarity, decomposed if compound, expanded to bridge vocabulary gaps.
Query Embedding
The processed query converts to vectors in the corpus's semantic space — intent becoming searchable geometry.
Candidate Search
Vector, keyword, or hybrid retrieval casts a deliberately wide net — recall's stage, with precision deferred.
Reranking
A cross-encoder re-scores the shortlist with full query-document attention — precision applied where it's affordable.
Context Assembly
Winners are deduplicated, ordered, formatted, and fitted to token budget — retrieval's output becoming the model's input.
Stage Telemetry
Recall, rank quality, latency, and end-task accuracy log per stage — the instrumentation that makes failures localizable.
// anatomy
The components teams must understand
01
Query Rewriter
The front-door fix
Reformulation, decomposition, and expansion converting messy human questions into retrievable ones — high leverage, often skipped.
02
Two-Stage Retrieval
Recall then precision
Cheap wide search followed by expensive narrow reranking — the cost-quality architecture underlying serious pipelines.
03
Hybrid & Filters
The middle machinery
Semantic-lexical fusion and metadata constraints threading through search — production traffic's non-negotiables.
04
Assembly Logic
The last mile
Ordering, deduplication, formatting, and budget-fitting — where retrieved quality survives into the prompt or doesn't.
05
Latency Ledger
The shared budget
Milliseconds allocated across stages against an end-to-end allowance — every quality addition paying from one account.
06
Stage Metrics
Localizable quality
Recall@k, reranker lift, assembly fidelity, end-task accuracy — the per-stage instrumentation that converts debugging into engineering.
// strategic implications
What this changes for the business
01 · Architecture
The pipeline is the product
RAG quality is decided across these stages far more than by model choice — and stage quality compounds multiplicatively, so the weakest step owns the ceiling. Investment logic follows: find the binding constraint, fix it, repeat. Upgrading the model while the pipeline leaks is paying premium prices for capped results.
02 · Operations
Instrument per stage or debug blind
End-to-end evaluation says something is wrong; stage-level metrics say what. Retrieval recall, reranker lift, and assembly fidelity localize failures to fixable components — the instrumentation investment that separates teams who improve steadily from teams who thrash.
03 · Performance
Latency is a joint budget
Query rewriting, hybrid search, and reranking each buy quality with milliseconds from one shared allowance. Set the end-to-end budget from product requirements, then allocate backward — pipelines designed stage-by-stage without the joint constraint discover it in production.
// common misconceptions
What Retrieval Pipeline is not
Myth
“RAG quality is about choosing the right model.”
Reality
Generation is capped by what retrieval delivers — and retrieval is this pipeline. Most underperforming RAG systems have pipeline problems wearing model costumes; audit the stages before shopping the frontier.
Myth
“A strong reranker fixes weak retrieval.”
Reality
Rerankers reorder what search retrieved — they cannot surface what it missed. Recall problems live upstream of precision tools; the fix matches the failing stage, not the most fashionable component.
Myth
“Once tuned, the pipeline stays tuned.”
Reality
Corpora grow, query mixes shift, content ages, and indexes accumulate churn — stage performance drifts independently. Pipelines are operated systems with regression monitoring, not configurations with a completion date.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.