// term 63 · Retrieval & Knowledge

Vector Search

Embedding-Based Retrieval

Querying a vector index to retrieve the embeddings closest to a query vector — the concrete database operation implementing semantic and similarity search. Vector search is where retrieval theory meets engineering: index structures, recall-latency trades, and filtered queries at production scale.

ANNHNSWIndexesRetrieval Infrastructure

// Workhorse

HNSW

Hierarchical navigable small-world graphs — the index family behind most production vector search, balancing recall and speed.

// Scale

ms @ 10⁹

Millisecond queries against billion-vector corpora — the performance that makes semantic retrieval interactive.

// Hard part

filters

Combining metadata constraints with graph traversal — the operation that separates production engines from demos.

// full definition

What Vector Search actually is

Vector search is the executable layer of semantic retrieval — the actual database operation behind every “find similar” feature. The contract is simple: store millions of embedding vectors; given a query vector, return its nearest neighbors fast. Everything interesting is in how: brute-force comparison dies at scale, so production search runs on approximate nearest neighbor (ANN) indexes — data structures purpose-built to find neighborhoods without visiting the whole space.

The dominant structure is HNSW — a layered graph where each vector links to its near neighbors, with sparse express layers above for long jumps. A query enters at the top, descends greedily toward its target region, and refines through denser layers to the final neighbors — logarithmic-ish navigation replacing linear scanning. Alternatives trade differently: inverted-file (IVF) indexes cluster the space and probe promising cells; product quantization compresses vectors to scan more of them per memory access; disk-based designs stretch beyond RAM economics.

Real queries complicate the picture with filters. “Nearest neighbors” is rarely the production question — it's nearest neighbors among this tenant's documents, in this date range, that this user may see. Filtered vector search must interleave constraint checking with graph traversal, and engines differ sharply in how well they manage it: post-filtering wastes retrieval on excluded items; pre-filtering can fragment the graph walk. Filter performance under realistic selectivity is the benchmark that exposes engines — and the first thing to test in evaluation.

Operationally, vector search is a tuned system, not a settled one. Index build parameters and query-time settings trade recall against latency and memory; the optimum shifts with corpus size, update rates, and filter patterns. Updates themselves are a hidden cost — graphs degrade under heavy churn and need maintenance or rebuilds. The discipline mirrors classic database administration: benchmark on your workload, monitor recall as data grows, and treat index health as an operational metric rather than an installation detail.

// how it works

Inside the vector query path

A vector search is a navigated walk through an index structure — built offline, traversed per query, tuned forever.

01

Index Build

Stored vectors organize into the ANN structure — graph links or cluster assignments computed offline, parameters set for the workload.

02

Query Arrival

A query vector arrives with its constraints — k, filters, and the latency budget the application demands.

03

Entry & Descent

Traversal begins at the index's entry point and navigates toward the query's region — express layers first, detail layers after.

04

Filtered Expansion

The neighborhood explores under metadata constraints — candidates checked against filters as the walk refines.

05

Result Assembly

Top-k neighbors return with scores and payloads — the retrieval result downstream reranking and generation consume.

06

Index Maintenance

Inserts, deletes, and drift degrade structure over time — monitoring and rebuilds keep recall where the benchmark left it.

// anatomy

The components teams must understand

01

HNSW Graph

The navigable structure

Layered small-world links enabling greedy descent to any neighborhood — the recall-speed workhorse of the field.

02

IVF & Quantization

The alternatives

Cluster-probing and compressed-vector scanning — different memory-speed-recall trades for different corpus shapes.

03

Build Parameters

Quality at construction

Graph connectivity and construction effort — set once, bounding the recall ceiling every query inherits.

04

Query-Time Dials

Per-request trades

Search breadth settings trading latency for recall on each call — the runtime knob applications actually hold.

05

Filtered Traversal

Constraints in the walk

Metadata and permission checks interleaved with navigation — the production requirement that separates engines.

06

Churn Management

Index health over time

Update handling, deletion debt, and rebuild cadence — the operational reality of structures built for static data.

// strategic implications

What this changes for the business

01 · Engineering

Retrieval performance is index engineering

The same vectors under different index structures and parameters span wide recall-latency-cost ranges — vector search is a tuned system. Benchmark candidate engines on your corpus, your filters, and your update patterns; generic benchmarks transfer poorly.

02 · Selection

Filters are the differentiating test

Every engine searches unfiltered vectors well; production queries carry tenancy, date, and permission constraints that stress traversal differently. Evaluate filtered performance at realistic selectivity first — it's where engine choices actually separate.

03 · Operations

Indexes age — monitor recall

Churn degrades graph structure, growth shifts optima, and recall decays silently while latency stays green. Recall measurement against ground-truth sets belongs in routine monitoring, with rebuild budgets planned rather than discovered.

// common misconceptions

What Vector Search is not

Myth

“Vector search is solved — engines are interchangeable.”

Reality

Engines differ sharply on filtered queries, update handling, and memory economics — the dimensions production workloads stress. The core ANN math is shared; the engineering around it is the product.

Myth

“Higher recall settings are always better.”

Reality

Recall buys latency and compute — and past the point where reranking absorbs misses, extra recall purchases nothing downstream. The right setting is the cheapest one your end-to-end quality target tolerates.

Myth

“Once built, the index just works.”

Reality

Inserts and deletes accumulate structural debt, and recall drifts as the corpus grows — without monitoring, quality decays invisibly. Index maintenance is database operations, not a one-time install.

// from literacy to leverage

Know the term. Now build the strategy.

Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.

AI innovation, applied
Andekian

AI-first digital transformation for enterprise growth. Strategy and execution, under one operator.

© 2026 Stephen Andekian.