# LLM — Large Language Model

> A neural network trained on internet-scale text using a transformer architecture. Through trillions of next-token predictions, the model develops a compressed probabilistic representation of language, reasoning, and world knowledge — encoded across billions of floating-point parameters.

**Canonical URL:** https://www.andekian.com/ai-lexicon/llm  
**Author / Site:** Stephen Andekian — https://www.andekian.com

**Term 01 of 100** · Foundational Architecture  
**Tags:** Pre-training, Transformer, Foundation, Generative

## Key Stats

- **Scale — 70B–405B:** Parameters in frontier models. Each encodes statistical associations learned across the training corpus.
- **Unit — ~0.75 words:** Per token — the atomic unit of all LLM input and output. Cost, speed, and capability all denominate in tokens.
- **Paradigm shift — 1 model:** Replaces dozens of specialist AI systems. LLMs collapse point-solution AI into a single general-purpose foundation.

## What LLM Actually Is

LLMs are trained on internet-scale datasets — often trillions of tokens spanning web pages, books, scientific papers, and codebases — using a self-supervised learning objective: predict the next token given all preceding context. Through trillions of such predictions, the model learns not just language patterns but causal relationships, factual associations, reasoning structures, and stylistic conventions across virtually every domain of recorded human knowledge.

What distinguishes modern LLMs is the transformer architecture introduced in the 2017 paper "Attention Is All You Need." Multi-head self-attention allows the model to dynamically weight the relevance of any prior token when generating each output — enabling coherent, long-range reasoning that previous recurrent architectures couldn't achieve at scale. This parallelism is what makes 100B+ parameter training computationally feasible.

Capabilities that emerge from sufficient scale are qualitatively different from smaller models. Above roughly 70B parameters, complex multi-step reasoning, code generation, and nuanced instruction-following approach or exceed human expert performance on narrow tasks. These emergent properties are not explicitly programmed — they arise from the optimization process itself, which has profound implications for capability forecasting and safety governance.

Critically, LLMs are not databases. They do not retrieve stored facts — they generate statistically plausible continuations based on learned probability distributions. This distinction is the root cause of hallucinations and the core reason RAG architectures exist: to ground model reasoning in verified, retrievable information rather than parametric memory alone.

## How It Works: From raw text to production model

Each stage compounds on the last — the pipeline executives use to assess build-vs-buy, fine-tuning ROI, and deployment trade-offs.

1. **Data Collection** — Curated trillion-token corpus assembled from web crawls, books, codebases, and scientific literature. Data quality and composition are the single largest determinant of base model capability and bias profile.
2. **Tokenization** — Raw text is split into sub-word units (~0.75 words each) and converted to numerical IDs. The tokenizer vocabulary size directly affects multilingual capability, cost efficiency, and code performance.
3. **Pre-training** — The transformer learns to predict each next token across the entire corpus using self-supervised learning. Requires massive compute — GPT-4 class training runs cost $50M–$100M+. This builds the foundational capability layer.
4. **Instruction Tuning** — Model trained on curated instruction-response pairs to follow natural language commands reliably. Transforms a raw text predictor into a useful assistant capable of structured task completion.
5. **RLHF Alignment** — Human raters score outputs; a reward model learns their preferences and shapes the LLM's behavior via reinforcement learning. Aligns the model with helpfulness, safety, and brand voice requirements.
6. **Inference** — The trained model receives live prompts and generates token-by-token responses based on learned distributions. All enterprise AI interactions — from chatbots to agentic workflows — are inference operations.

## Anatomy: The Components Teams Must Understand

- **Transformer Architecture** (Multi-head self-attention): The foundational neural network design. Self-attention enables the model to reason across long sequences in parallel — making 100B+ scale computationally feasible. All modern LLMs are transformer-based.
- **Parameters & Weights** (Learned intelligence as math): Billions of floating-point values encoding all learned knowledge. A 405B model stores over 800GB of compressed world understanding entirely within these numbers. Parameter count drives both capability and compute cost.
- **Context Window** (Operational memory limit): The active memory available per session — how much document, conversation, or retrieved data the model can simultaneously reason over. Ranges from 8K to 1M+ tokens. Window size governs your RAG architecture requirements.
- **Tokenizer** (Text → numerical IDs): Maps raw text to numerical token IDs before processing. Vocabulary size and sub-word algorithm affect multilingual performance and code quality. All API cost is measured in tokens, not words.
- **Temperature** (Output randomness control): Controls sampling randomness at inference time. Low (0.1) = deterministic outputs ideal for factual tasks. High (1.0+) = creative and exploratory. Must be calibrated to use-case requirements in every production deployment.
- **System Prompt** (Hidden instruction layer): Hidden instructions prepended to every conversation, defining model persona, constraints, and guardrails. The primary customization mechanism without fine-tuning. Counts against the context window budget on every request.

## Strategic Implications

- **LLMs are infrastructure, not features** (01 · Strategy): LLMs are general-purpose platforms enabling dozens of enterprise applications. The strategic question is not which feature to build, but which foundation model to build on — and whether to fine-tune, use an API, or run open weights on-premises. This decision determines your cost curve, data privacy posture, and the competitive defensibility of everything built on top.
- **Proprietary data is your defensible advantage** (02 · Moat): Fine-tuned models trained on internal knowledge consistently outperform general-purpose models on domain-specific tasks. Organizations that move early on systematic data curation and fine-tuning build AI capabilities that are genuinely difficult to replicate. Every competitor can access GPT-4 or Claude. The moat is what you've taught it about your business, customers, and domain.
- **Token economics reshape cost structures** (03 · Economics): Deployment costs scale with usage, not licensing. A single agentic workflow making 50 LLM calls per user interaction can generate costs orders of magnitude beyond naive estimates. This fundamentally changes ROI modeling and infrastructure budgeting. Executives need token-denominated cost models before approving AI deployment at scale.

## Common Misconceptions

- **Myth:** “LLMs understand language and reason the way humans do.”  
  **Reality:** LLMs compute probability distributions over token sequences. Whether this constitutes “understanding” is an active scientific debate. Treat LLM outputs as probabilistic, not authoritative — and design systems with verification layers accordingly.
- **Myth:** “Bigger models always perform better for our use case.”  
  **Reality:** Scale matters less than task fit. A fine-tuned 7B model on your domain will consistently outperform a general-purpose 70B model on specific enterprise tasks, at dramatically lower latency and cost. Right-sizing is a competitive advantage, not a compromise.
- **Myth:** “LLMs store and retrieve facts like a database.”  
  **Reality:** LLMs generate statistically plausible continuations — they do not retrieve stored facts. This is the root cause of hallucinations and the reason RAG architectures exist. Any production system requiring factual accuracy needs grounding infrastructure, not just a larger model.

## Related Terms

- [Token — Unit Of AI Processing](https://www.andekian.com/ai-lexicon/token)
- [Context Window — Operational Memory Limit](https://www.andekian.com/ai-lexicon/context-window)
- [Fine-Tuning — Domain-Specific Mastery](https://www.andekian.com/ai-lexicon/fine-tuning)
- [RLHF — Reinforcement Learning From Human Feedback](https://www.andekian.com/ai-lexicon/rlhf)
- [Weights & Parameters — Learned Intelligence As Math](https://www.andekian.com/ai-lexicon/weights-and-parameters)
- [Inference — Runtime AI Execution](https://www.andekian.com/ai-lexicon/inference)
- [Transformer Architecture — Modern LLM Foundation](https://www.andekian.com/ai-lexicon/transformer-architecture)
- [Foundation Model — Large Generalized Model](https://www.andekian.com/ai-lexicon/foundation-model)

## Explore the Full Lexicon

All 100 terms: https://www.andekian.com/ai-lexicon

## Contact

Book a conversation or send an inquiry: https://www.andekian.com/#contact
LinkedIn: https://www.linkedin.com/in/andekian/