// term 03 · Core Mechanics
Token
Unit of AI Processing
The atomic unit of LLM input and output — a sub-word chunk of roughly 0.75 English words, mapped to a numerical ID. Models never see text; they see token sequences. Every cost, latency, and capability conversation in enterprise AI ultimately denominates in tokens.
// Ratio
~0.75 words
Per English token on average. Code, non-Latin scripts, and dense formatting tokenize less efficiently — sometimes 2–4x more tokens per word.
// Vocabulary
50K–250K
Distinct tokens in modern tokenizers. Vocabulary design directly shapes multilingual capability, code performance, and cost efficiency.
// Billing
per 1M tokens
The universal pricing unit of model APIs. Input and output tokens are metered separately — output typically costs 3–5x more per token.
// full definition
What Token actually is
Before a model processes anything, text is segmented into tokens — sub-word fragments drawn from a fixed vocabulary built during training. Common words survive as single tokens; rare words shatter into pieces; whitespace and punctuation carry their own entries. The model's entire universe is sequences of these integer IDs. It has never seen a letter, a word, or a sentence — only tokens.
This invisible layer drives the economics of every deployment. API pricing meters input and output tokens separately, context windows are token budgets, and latency scales with output length because generation is serial — one token per step. An agentic workflow that loops fifty times multiplies token spend fifty-fold. Cost models built on per-request or per-user intuitions consistently miss by orders of magnitude.
Tokenization also shapes capability in ways that surprise teams. Models reason over tokens, not characters — the root cause of classic failures at letter counting, spelling manipulation, and arithmetic on long numbers, which get fragmented unpredictably. Non-English languages and source code can consume several times more tokens per unit of meaning, degrading both economics and effective context for those workloads.
For decision-makers, the token is the right unit of account. Cost per resolved ticket, per processed document, per agent run — all reduce to token arithmetic: tokens in, tokens out, calls per workflow. Teams that instrument token consumption from day one ship with predictable margins; teams that discover token economics in their first production invoice renegotiate their architecture under duress.
// how it works
From raw text to token stream
Tokenization is the invisible first and last step of every model call — and the lever behind cost, latency, and context budgeting.
Text Normalization
Raw input is standardized — whitespace, casing conventions, and unicode forms are resolved before segmentation begins.
Subword Segmentation
A byte-pair encoding (BPE) algorithm splits text into the longest chunks present in the vocabulary — frequent words stay whole, rare words fragment into pieces.
ID Mapping
Each chunk maps to an integer ID in the fixed vocabulary. From here on, the model operates purely on sequences of these numbers.
Embedding Lookup
Each ID indexes into a learned embedding matrix, converting it to the high-dimensional vector the transformer actually processes.
Generation
The model emits one token at a time, each sampled from a probability distribution over the full vocabulary. This serial loop is why output length dominates latency.
Detokenization
Output IDs are mapped back to text fragments and stitched into the response the user reads. The full round trip is invisible — and billed.
// anatomy
The components teams must understand
01
Vocabulary
The model's atomic alphabet
The fixed inventory of all tokens a model can read or emit, set at training time and immutable thereafter. Everything the model ever says is assembled from this set.
02
BPE Algorithm
Frequency-based splitting
Builds the vocabulary by iteratively merging the most frequent character pairs in training data. Frequent strings become single tokens; rare ones split into fragments.
03
Special Tokens
Invisible control characters
Reserved tokens mark message boundaries, roles, and tool calls — the hidden scaffolding behind chat formats and agent protocols.
04
Token Embeddings
Numbers all the way down
The bridge between symbolic text and continuous math. Representation quality per token depends on how often it appeared in training data.
05
Context Budget
Tokens as scarce resource
System prompts, documents, history, and responses compete for one fixed token window. Prompt engineering is, in large part, token budgeting.
06
Pricing Meter
Cost per token
APIs meter input and output separately, with output at a premium. Prompt caching, batching, and compression exist because the meter never stops running.
// strategic implications
What this changes for the business
01 · Economics
Model costs are token costs
Per-seat licensing intuitions fail for AI. Spend scales with usage volume × verbosity × architecture: an agentic pipeline looping 50 times multiplies token consumption 50x. Finance teams need token-denominated unit economics — cost per ticket resolved, per document processed, per workflow run — before approving scale-up, not after the first invoice.
02 · Performance
Latency lives in output tokens
Input is processed in parallel; output is generated serially, token by token. Trimming response length cuts user-perceived latency far more than trimming prompts. Streaming, response budgets, and concise output formats are performance engineering — and they compound directly with cost savings.
03 · Capability
Tokenization shapes what models do well
Models reason over tokens, not characters — explaining systematic failures at spelling tasks and long-number arithmetic. Multilingual and code-heavy workloads should be evaluated for tokenizer efficiency before vendor selection: token bloat on your language mix means structurally higher cost and degraded effective context.
// common misconceptions
What Token is not
Myth
“A token is a word.”
Reality
Tokens are sub-word fragments. “Unbelievable” may split into three tokens; a frequent word maps to one; a single emoji can consume several. The ~0.75-word ratio is an English-language average, not a rule — and it degrades sharply for code and non-Latin scripts.
Myth
“Token costs are too small to matter.”
Reality
Per-token prices look negligible; production volumes are not. A high-traffic agentic deployment can consume billions of tokens monthly. Token economics routinely decide model routing, caching strategy, and build-vs-buy — they are board-level numbers at scale.
Myth
“Models read text the way people do.”
Reality
Models see only integer sequences. Character-level reasoning, spelling, and digit manipulation are systematically hard because the characters were merged away before the model ever saw the input. Knowing this failure class prevents misdiagnosing it as general incompetence.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.