# RAG — Retrieval-Augmented Generation

> An architecture that retrieves relevant documents from your knowledge sources at request time and injects them into the model's context — grounding generation in verified, current, access-controlled information instead of whatever the model memorized during training.

**Canonical URL:** https://www.andekian.com/ai-lexicon/rag  
**Author / Site:** Stephen Andekian — https://www.andekian.com

**Term 08 of 100** · Retrieval & Knowledge  
**Tags:** Retrieval, Grounding, Vector Search, Enterprise Knowledge

## Key Stats

- **Reliability — up to 90%:** Reduction in factual errors on knowledge-intensive tasks when generation is grounded in retrieved sources — the highest-leverage hallucination control available.
- **Freshness — 0 retraining:** Knowledge updates ship by re-indexing documents, not retraining models. A policy changed at 9:00 is answerable by 9:05.
- **Adoption — #1 pattern:** The dominant enterprise GenAI architecture — most production deployments that touch corporate knowledge are RAG systems.

## What RAG Actually Is

RAG solves the defining problem of enterprise AI: models know the internet circa their training cutoff, not your business today. Instead of hoping the answer lives in parametric memory, the system retrieves relevant passages from your own sources at request time and instructs the model to answer from that evidence. The model becomes a reasoning engine over your knowledge rather than an oracle of its own.

The pipeline runs in two phases. Offline, documents are parsed, split into retrieval-sized chunks, converted to embedding vectors, and indexed in a vector database with metadata and permissions. Online, the user's query is embedded, the nearest chunks are retrieved and reranked, and the winners are assembled into the prompt alongside instructions to answer from sources and cite them. Each stage compounds: parsing quality caps chunking, chunking caps retrieval, retrieval caps the answer.

RAG's strategic advantages go beyond accuracy. Knowledge stays current without retraining — re-index and the system knows. Knowledge stays governed — retrieval can enforce document-level permissions at query time, so the system answers each user only from what they are entitled to see. And answers arrive with citations, giving regulated industries the audit trail that ungrounded generation cannot provide.

The gap between a RAG demo and a dependable RAG system is the gap between a weekend and a roadmap. Production quality demands clean document parsing, structure-aware chunking, hybrid search, reranking, permission enforcement, and retrieval evaluation — classic search engineering wearing an AI costume. Teams that staff it accordingly ship; teams that treat it as prompt glue stall at demo quality indefinitely.

## How It Works: From document pile to grounded answer

RAG is a pipeline, and its quality is multiplicative — a weak stage caps everything downstream of it.

1. **Ingestion & Chunking** — Documents are parsed and split into retrieval-sized segments. Chunking strategy quietly sets the ceiling on answer quality.
2. **Embedding** — Each chunk becomes a vector encoding its meaning, enabling search by semantics rather than keyword overlap.
3. **Indexing** — Vectors land in a vector database with metadata and permissions attached — the searchable memory of the system.
4. **Retrieval & Ranking** — The query is embedded, nearest chunks are fetched, and a reranker promotes the truly relevant. Precision here is the system's bottleneck.
5. **Context Assembly** — Top chunks are formatted into the prompt with instructions to answer strictly from sources and cite them.
6. **Grounded Generation** — The model synthesizes an answer constrained to the supplied evidence — with citations enabling verification and audit.

## Anatomy: The Components Teams Must Understand

- **Embedding Model** (The meaning encoder): Converts text to vectors. Its domain fit determines whether “churn” the metric and “churn” the dairy verb land in the right neighborhoods.
- **Vector Database** (Semantic memory): Stores and searches embeddings at scale — with the metadata filtering and access controls that enterprise retrieval actually requires.
- **Chunking Strategy** (Invisible quality lever): Segment size and boundaries trade precision against context. Structure-aware chunking consistently beats naive fixed-size splitting.
- **Retriever & Reranker** (Two-stage precision): Fast vector search casts a wide net; a cross-encoder reranker orders by true relevance. The combination defines retrieval quality.
- **Prompt Assembler** (Context orchestration): Merges instructions, retrieved evidence, and the query within token budget. Placement and formatting measurably affect grounding fidelity.
- **Citation Layer** (Trust infrastructure): Links each claim to its source chunk — enabling human verification, compliance review, and the audit trail regulated deployments require.

## Strategic Implications

- **RAG is a search problem wearing an AI costume** (01 · Architecture): Generation quality is capped by retrieval quality, and retrieval is classic information-retrieval engineering: parsing, chunking, indexing, ranking, evaluation. Teams that staff RAG as a search problem ship dependable systems; teams that treat it as prompt glue stall at demo quality. Hire and budget accordingly.
- **Your knowledge base is now production infrastructure** (02 · Data): Stale, duplicated, and contradictory documents become wrong answers delivered with confident citations. RAG forces a content hygiene discipline most organizations never had — ownership, freshness SLAs, deprecation workflows. The knowledge base graduates from intranet artifact to production dependency.
- **Retrieval must respect permissions** (03 · Security): A RAG system that indexes everything and retrieves for anyone is a data-leak engine with a natural-language interface. Document-level access controls enforced at query time are non-negotiable — the retrieval layer inherits your identity architecture, and auditors will ask to see it.

## Common Misconceptions

- **Myth:** “RAG eliminates hallucination.”  
  **Reality:** It reduces hallucination dramatically on knowledge-intensive tasks. Models can still misread, over-synthesize, or ignore retrieved evidence — verification and citation checking remain part of the production stack.
- **Myth:** “Million-token context windows make RAG obsolete.”  
  **Reality:** Long context changes the chunking math, not the economics. Resending your knowledge base with every request costs orders of magnitude more than retrieving the right three paragraphs — and mid-context recall degrades. Precision retrieval wins on cost, latency, and accuracy.
- **Myth:** “RAG is plug-and-play.”  
  **Reality:** A demo is. Production RAG — clean parsing, smart chunking, hybrid search, reranking, permissions, retrieval evals — is a search engineering program. The gap between demo and dependable is where most projects die.

## Related Terms

- [Hallucination — Confidence Without Accuracy](https://www.andekian.com/ai-lexicon/hallucination)
- [Embeddings — Meaning Encoded As Vectors](https://www.andekian.com/ai-lexicon/embeddings)
- [Vector Database — Stores Vector Embeddings](https://www.andekian.com/ai-lexicon/vector-database)
- [Semantic Search — Meaning-Based Retrieval](https://www.andekian.com/ai-lexicon/semantic-search)
- [Chunking — Document Segmentation Process](https://www.andekian.com/ai-lexicon/chunking)
- [Grounding — Source-Connected Outputs](https://www.andekian.com/ai-lexicon/grounding)
- [Context Injection — Dynamic Information Insertion](https://www.andekian.com/ai-lexicon/context-injection)
- [Retrieval Pipeline — Information Retrieval Flow](https://www.andekian.com/ai-lexicon/retrieval-pipeline)

## Explore the Full Lexicon

All 100 terms: https://www.andekian.com/ai-lexicon

## Contact

Book a conversation or send an inquiry: https://www.andekian.com/#contact
LinkedIn: https://www.linkedin.com/in/andekian/