// term 03 · Core Mechanics

Token

Unit of AI Processing

The atomic unit of LLM input and output — a sub-word chunk of roughly 0.75 English words, mapped to a numerical ID. Models never see text; they see token sequences. Every cost, latency, and capability conversation in enterprise AI ultimately denominates in tokens.

TokenizationBPEEconomicsThroughput

// Ratio

~0.75 words

Per English token on average. Code, non-Latin scripts, and dense formatting tokenize less efficiently — sometimes 2–4x more tokens per word.

// Vocabulary

50K–250K

Distinct tokens in modern tokenizers. Vocabulary design directly shapes multilingual capability, code performance, and cost efficiency.

// Billing

per 1M tokens

The universal pricing unit of model APIs. Input and output tokens are metered separately — output typically costs 3–5x more per token.

// full definition

What Token actually is

Before a model processes anything, text is segmented into tokens — sub-word fragments drawn from a fixed vocabulary built during training. Common words survive as single tokens; rare words shatter into pieces; whitespace and punctuation carry their own entries. The model's entire universe is sequences of these integer IDs. It has never seen a letter, a word, or a sentence — only tokens.

This invisible layer drives the economics of every deployment. API pricing meters input and output tokens separately, context windows are token budgets, and latency scales with output length because generation is serial — one token per step. An agentic workflow that loops fifty times multiplies token spend fifty-fold. Cost models built on per-request or per-user intuitions consistently miss by orders of magnitude.

Tokenization also shapes capability in ways that surprise teams. Models reason over tokens, not characters — the root cause of classic failures at letter counting, spelling manipulation, and arithmetic on long numbers, which get fragmented unpredictably. Non-English languages and source code can consume several times more tokens per unit of meaning, degrading both economics and effective context for those workloads.

For decision-makers, the token is the right unit of account. Cost per resolved ticket, per processed document, per agent run — all reduce to token arithmetic: tokens in, tokens out, calls per workflow. Teams that instrument token consumption from day one ship with predictable margins; teams that discover token economics in their first production invoice renegotiate their architecture under duress.

// how it works

From raw text to token stream

Tokenization is the invisible first and last step of every model call — and the lever behind cost, latency, and context budgeting.

Text Normalization

Raw input is standardized — whitespace, casing conventions, and unicode forms are resolved before segmentation begins.

Subword Segmentation

A byte-pair encoding (BPE) algorithm splits text into the longest chunks present in the vocabulary — frequent words stay whole, rare words fragment into pieces.

ID Mapping

Each chunk maps to an integer ID in the fixed vocabulary. From here on, the model operates purely on sequences of these numbers.

Embedding Lookup

Each ID indexes into a learned embedding matrix, converting it to the high-dimensional vector the transformer actually processes.

Generation

The model emits one token at a time, each sampled from a probability distribution over the full vocabulary. This serial loop is why output length dominates latency.

Detokenization

Output IDs are mapped back to text fragments and stitched into the response the user reads. The full round trip is invisible — and billed.

// anatomy

The components teams must understand

Vocabulary

The model's atomic alphabet

The fixed inventory of all tokens a model can read or emit, set at training time and immutable thereafter. Everything the model ever says is assembled from this set.

BPE Algorithm

Frequency-based splitting

Builds the vocabulary by iteratively merging the most frequent character pairs in training data. Frequent strings become single tokens; rare ones split into fragments.

Special Tokens

Invisible control characters

Reserved tokens mark message boundaries, roles, and tool calls — the hidden scaffolding behind chat formats and agent protocols.

Token Embeddings

Numbers all the way down

The bridge between symbolic text and continuous math. Representation quality per token depends on how often it appeared in training data.

Context Budget

Tokens as scarce resource

System prompts, documents, history, and responses compete for one fixed token window. Prompt engineering is, in large part, token budgeting.

Pricing Meter

Cost per token

APIs meter input and output separately, with output at a premium. Prompt caching, batching, and compression exist because the meter never stops running.

// strategic implications

What this changes for the business

01 · Economics

Model costs are token costs

Per-seat licensing intuitions fail for AI. Spend scales with usage volume × verbosity × architecture: an agentic pipeline looping 50 times multiplies token consumption 50x. Finance teams need token-denominated unit economics — cost per ticket resolved, per document processed, per workflow run — before approving scale-up, not after the first invoice.

02 · Performance

Latency lives in output tokens

Input is processed in parallel; output is generated serially, token by token. Trimming response length cuts user-perceived latency far more than trimming prompts. Streaming, response budgets, and concise output formats are performance engineering — and they compound directly with cost savings.

03 · Capability

Tokenization shapes what models do well

Models reason over tokens, not characters — explaining systematic failures at spelling tasks and long-number arithmetic. Multilingual and code-heavy workloads should be evaluated for tokenizer efficiency before vendor selection: token bloat on your language mix means structurally higher cost and degraded effective context.

// common misconceptions

What Token is not

Myth

“A token is a word.”

Reality

Tokens are sub-word fragments. “Unbelievable” may split into three tokens; a frequent word maps to one; a single emoji can consume several. The ~0.75-word ratio is an English-language average, not a rule — and it degrades sharply for code and non-Latin scripts.

Myth

“Token costs are too small to matter.”

Reality

Per-token prices look negligible; production volumes are not. A high-traffic agentic deployment can consume billions of tokens monthly. Token economics routinely decide model routing, caching strategy, and build-vs-buy — they are board-level numbers at scale.

Myth

“Models read text the way people do.”

Reality

Models see only integer sequences. Character-level reasoning, spelling, and digit manipulation are systematically hard because the characters were merged away before the model ever saw the input. Knowing this failure class prevents misdiagnosing it as general incompetence.

// from literacy to leverage

Know the term. Now build the strategy.

Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.

AI innovation, applied

Token

What Token actually is

From raw text to token stream

The components teams must understand

What this changes for the business

What Token is not

Explore the wider architecture

Know the term. Now build the strategy.