// term 01 · Foundational Architecture

LLM

Large Language Model

A neural network trained on internet-scale text using a transformer architecture. Through trillions of next-token predictions, the model develops a compressed probabilistic representation of language, reasoning, and world knowledge — encoded across billions of floating-point parameters.

Pre-trainingTransformerFoundationGenerative

// Scale

70B–405B

Parameters in frontier models. Each encodes statistical associations learned across the training corpus.

// Unit

~0.75 words

Per token — the atomic unit of all LLM input and output. Cost, speed, and capability all denominate in tokens.

// Paradigm shift

1 model

Replaces dozens of specialist AI systems. LLMs collapse point-solution AI into a single general-purpose foundation.

// full definition

What LLM actually is

LLMs are trained on internet-scale datasets — often trillions of tokens spanning web pages, books, scientific papers, and codebases — using a self-supervised learning objective: predict the next token given all preceding context. Through trillions of such predictions, the model learns not just language patterns but causal relationships, factual associations, reasoning structures, and stylistic conventions across virtually every domain of recorded human knowledge.

What distinguishes modern LLMs is the transformer architecture introduced in the 2017 paper "Attention Is All You Need." Multi-head self-attention allows the model to dynamically weight the relevance of any prior token when generating each output — enabling coherent, long-range reasoning that previous recurrent architectures couldn't achieve at scale. This parallelism is what makes 100B+ parameter training computationally feasible.

Capabilities that emerge from sufficient scale are qualitatively different from smaller models. Above roughly 70B parameters, complex multi-step reasoning, code generation, and nuanced instruction-following approach or exceed human expert performance on narrow tasks. These emergent properties are not explicitly programmed — they arise from the optimization process itself, which has profound implications for capability forecasting and safety governance.

Critically, LLMs are not databases. They do not retrieve stored facts — they generate statistically plausible continuations based on learned probability distributions. This distinction is the root cause of hallucinations and the core reason RAG architectures exist: to ground model reasoning in verified, retrievable information rather than parametric memory alone.

// how it works

From raw text to production model

Each stage compounds on the last — the pipeline executives use to assess build-vs-buy, fine-tuning ROI, and deployment trade-offs.

Data Collection

Curated trillion-token corpus assembled from web crawls, books, codebases, and scientific literature. Data quality and composition are the single largest determinant of base model capability and bias profile.

Tokenization

Raw text is split into sub-word units (~0.75 words each) and converted to numerical IDs. The tokenizer vocabulary size directly affects multilingual capability, cost efficiency, and code performance.

Pre-training

The transformer learns to predict each next token across the entire corpus using self-supervised learning. Requires massive compute — GPT-4 class training runs cost $50M–$100M+. This builds the foundational capability layer.

Instruction Tuning

Model trained on curated instruction-response pairs to follow natural language commands reliably. Transforms a raw text predictor into a useful assistant capable of structured task completion.

RLHF Alignment

Human raters score outputs; a reward model learns their preferences and shapes the LLM's behavior via reinforcement learning. Aligns the model with helpfulness, safety, and brand voice requirements.

Inference

The trained model receives live prompts and generates token-by-token responses based on learned distributions. All enterprise AI interactions — from chatbots to agentic workflows — are inference operations.

// anatomy

The components teams must understand

Transformer Architecture

Multi-head self-attention

The foundational neural network design. Self-attention enables the model to reason across long sequences in parallel — making 100B+ scale computationally feasible. All modern LLMs are transformer-based.

Parameters & Weights

Learned intelligence as math

Billions of floating-point values encoding all learned knowledge. A 405B model stores over 800GB of compressed world understanding entirely within these numbers. Parameter count drives both capability and compute cost.

Context Window

Operational memory limit

The active memory available per session — how much document, conversation, or retrieved data the model can simultaneously reason over. Ranges from 8K to 1M+ tokens. Window size governs your RAG architecture requirements.

Tokenizer

Text → numerical IDs

Maps raw text to numerical token IDs before processing. Vocabulary size and sub-word algorithm affect multilingual performance and code quality. All API cost is measured in tokens, not words.

Temperature

Output randomness control

Controls sampling randomness at inference time. Low (0.1) = deterministic outputs ideal for factual tasks. High (1.0+) = creative and exploratory. Must be calibrated to use-case requirements in every production deployment.

System Prompt

Hidden instruction layer

Hidden instructions prepended to every conversation, defining model persona, constraints, and guardrails. The primary customization mechanism without fine-tuning. Counts against the context window budget on every request.

// strategic implications

What this changes for the business

01 · Strategy

LLMs are infrastructure, not features

LLMs are general-purpose platforms enabling dozens of enterprise applications. The strategic question is not which feature to build, but which foundation model to build on — and whether to fine-tune, use an API, or run open weights on-premises. This decision determines your cost curve, data privacy posture, and the competitive defensibility of everything built on top.

02 · Moat

Proprietary data is your defensible advantage

Fine-tuned models trained on internal knowledge consistently outperform general-purpose models on domain-specific tasks. Organizations that move early on systematic data curation and fine-tuning build AI capabilities that are genuinely difficult to replicate. Every competitor can access GPT-4 or Claude. The moat is what you've taught it about your business, customers, and domain.

03 · Economics

Token economics reshape cost structures

Deployment costs scale with usage, not licensing. A single agentic workflow making 50 LLM calls per user interaction can generate costs orders of magnitude beyond naive estimates. This fundamentally changes ROI modeling and infrastructure budgeting. Executives need token-denominated cost models before approving AI deployment at scale.

// common misconceptions

What LLM is not

Myth

“LLMs understand language and reason the way humans do.”

Reality

LLMs compute probability distributions over token sequences. Whether this constitutes “understanding” is an active scientific debate. Treat LLM outputs as probabilistic, not authoritative — and design systems with verification layers accordingly.

Myth

“Bigger models always perform better for our use case.”

Reality

Scale matters less than task fit. A fine-tuned 7B model on your domain will consistently outperform a general-purpose 70B model on specific enterprise tasks, at dramatically lower latency and cost. Right-sizing is a competitive advantage, not a compromise.

Myth

“LLMs store and retrieve facts like a database.”

Reality

LLMs generate statistically plausible continuations — they do not retrieve stored facts. This is the root cause of hallucinations and the reason RAG architectures exist. Any production system requiring factual accuracy needs grounding infrastructure, not just a larger model.

// from literacy to leverage

Know the term. Now build the strategy.

Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.

AI innovation, applied

LLM

What LLM actually is

From raw text to production model

The components teams must understand

What this changes for the business

What LLM is not

Explore the wider architecture

Know the term. Now build the strategy.