// term 63 · Retrieval & Knowledge
Vector Search
Embedding-Based Retrieval
Querying a vector index to retrieve the embeddings closest to a query vector — the concrete database operation implementing semantic and similarity search. Vector search is where retrieval theory meets engineering: index structures, recall-latency trades, and filtered queries at production scale.
// Workhorse
HNSW
Hierarchical navigable small-world graphs — the index family behind most production vector search, balancing recall and speed.
// Scale
ms @ 10⁹
Millisecond queries against billion-vector corpora — the performance that makes semantic retrieval interactive.
// Hard part
filters
Combining metadata constraints with graph traversal — the operation that separates production engines from demos.
// full definition
What Vector Search actually is
Vector search is the executable layer of semantic retrieval — the actual database operation behind every “find similar” feature. The contract is simple: store millions of embedding vectors; given a query vector, return its nearest neighbors fast. Everything interesting is in how: brute-force comparison dies at scale, so production search runs on approximate nearest neighbor (ANN) indexes — data structures purpose-built to find neighborhoods without visiting the whole space.
The dominant structure is HNSW — a layered graph where each vector links to its near neighbors, with sparse express layers above for long jumps. A query enters at the top, descends greedily toward its target region, and refines through denser layers to the final neighbors — logarithmic-ish navigation replacing linear scanning. Alternatives trade differently: inverted-file (IVF) indexes cluster the space and probe promising cells; product quantization compresses vectors to scan more of them per memory access; disk-based designs stretch beyond RAM economics.
Real queries complicate the picture with filters. “Nearest neighbors” is rarely the production question — it's nearest neighbors among this tenant's documents, in this date range, that this user may see. Filtered vector search must interleave constraint checking with graph traversal, and engines differ sharply in how well they manage it: post-filtering wastes retrieval on excluded items; pre-filtering can fragment the graph walk. Filter performance under realistic selectivity is the benchmark that exposes engines — and the first thing to test in evaluation.
Operationally, vector search is a tuned system, not a settled one. Index build parameters and query-time settings trade recall against latency and memory; the optimum shifts with corpus size, update rates, and filter patterns. Updates themselves are a hidden cost — graphs degrade under heavy churn and need maintenance or rebuilds. The discipline mirrors classic database administration: benchmark on your workload, monitor recall as data grows, and treat index health as an operational metric rather than an installation detail.
// how it works
Inside the vector query path
A vector search is a navigated walk through an index structure — built offline, traversed per query, tuned forever.
Index Build
Stored vectors organize into the ANN structure — graph links or cluster assignments computed offline, parameters set for the workload.
Query Arrival
A query vector arrives with its constraints — k, filters, and the latency budget the application demands.
Entry & Descent
Traversal begins at the index's entry point and navigates toward the query's region — express layers first, detail layers after.
Filtered Expansion
The neighborhood explores under metadata constraints — candidates checked against filters as the walk refines.
Result Assembly
Top-k neighbors return with scores and payloads — the retrieval result downstream reranking and generation consume.
Index Maintenance
Inserts, deletes, and drift degrade structure over time — monitoring and rebuilds keep recall where the benchmark left it.
// anatomy
The components teams must understand
01
HNSW Graph
The navigable structure
Layered small-world links enabling greedy descent to any neighborhood — the recall-speed workhorse of the field.
02
IVF & Quantization
The alternatives
Cluster-probing and compressed-vector scanning — different memory-speed-recall trades for different corpus shapes.
03
Build Parameters
Quality at construction
Graph connectivity and construction effort — set once, bounding the recall ceiling every query inherits.
04
Query-Time Dials
Per-request trades
Search breadth settings trading latency for recall on each call — the runtime knob applications actually hold.
05
Filtered Traversal
Constraints in the walk
Metadata and permission checks interleaved with navigation — the production requirement that separates engines.
06
Churn Management
Index health over time
Update handling, deletion debt, and rebuild cadence — the operational reality of structures built for static data.
// strategic implications
What this changes for the business
01 · Engineering
Retrieval performance is index engineering
The same vectors under different index structures and parameters span wide recall-latency-cost ranges — vector search is a tuned system. Benchmark candidate engines on your corpus, your filters, and your update patterns; generic benchmarks transfer poorly.
02 · Selection
Filters are the differentiating test
Every engine searches unfiltered vectors well; production queries carry tenancy, date, and permission constraints that stress traversal differently. Evaluate filtered performance at realistic selectivity first — it's where engine choices actually separate.
03 · Operations
Indexes age — monitor recall
Churn degrades graph structure, growth shifts optima, and recall decays silently while latency stays green. Recall measurement against ground-truth sets belongs in routine monitoring, with rebuild budgets planned rather than discovered.
// common misconceptions
What Vector Search is not
Myth
“Vector search is solved — engines are interchangeable.”
Reality
Engines differ sharply on filtered queries, update handling, and memory economics — the dimensions production workloads stress. The core ANN math is shared; the engineering around it is the product.
Myth
“Higher recall settings are always better.”
Reality
Recall buys latency and compute — and past the point where reranking absorbs misses, extra recall purchases nothing downstream. The right setting is the cheapest one your end-to-end quality target tolerates.
Myth
“Once built, the index just works.”
Reality
Inserts and deletes accumulate structural debt, and recall drifts as the corpus grows — without monitoring, quality decays invisibly. Index maintenance is database operations, not a one-time install.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.