// term 09 · Reasoning & Cognition
Chain of Thought
Sequential Reasoning Engine
Eliciting intermediate reasoning steps before a final answer — by prompting or by training. Decomposing problems into explicit sequential steps dramatically improves accuracy on math, logic, and planning, trading more tokens and latency for more reliable conclusions.
// Lift
+20–50pts
Accuracy improvement on math and multi-step reasoning benchmarks versus direct answering — among the largest known prompting effects.
// Cost
2–10x
Token multiplier for reasoning traces. Dedicated reasoning models can spend thousands of thinking tokens on a single hard problem.
// Paradigm
test-time
CoT seeded the test-time-compute era: spending inference compute on thinking now rivals model scale as the capability lever.
// full definition
What Chain of Thought actually is
Chain of thought exploits a structural fact about autoregressive models: every generated token becomes context for the next. When a model jumps directly to an answer, one token prediction carries the entire cognitive load. When it reasons stepwise, each intermediate conclusion is written down and conditioned on — arithmetic stays local, logical dependencies stay explicit, and a hard leap becomes a sequence of easy ones.
What began as a prompting trick — appending “think step by step” — became a training paradigm. Modern reasoning models are trained with reinforcement learning to produce extended deliberation by default, allocating thinking tokens dynamically to problem difficulty. Techniques stack on top: self-consistency samples multiple independent chains and takes the majority answer; verification passes audit the chain's claims; tree-structured variants explore alternative paths.
The economics are a genuine trade. Reasoning multiplies token consumption and latency per query — sometimes by an order of magnitude — in exchange for accuracy on problems where direct answering fails. This created the test-time-compute axis: capability now scales not just with model size and training data, but with how much thinking a deployment is willing to buy per question. Routing by difficulty is the discipline that keeps the bill rational.
One caution deserves executive attention: the visible trace is generated text, not a guaranteed transcript of internal computation. Traces are usually predictive of how the answer was reached, and they make verification tractable in a way bare conclusions never were — but models can rationalize. Treat reasoning traces as auditable artifacts to be checked, not as ground-truth psychology.
// how it works
How stepwise thinking changes the answer
Chain of thought lets the model condition each conclusion on explicit prior steps — converting one hard leap into many easy ones.
Elicitation
The prompt requests visible reasoning — or the model is trained to reason by default. Either way, intermediate tokens are the point, not decoration.
Decomposition
The problem is broken into sub-problems: extract the knowns, identify the goal, plan the path between them.
Sequential Derivation
Each step generates context that conditions the next — arithmetic stays local, logic stays explicit, and errors become visible instead of silent.
Self-Consistency
Multiple independent chains are sampled and the majority answer wins — trading additional compute for measurable accuracy gains.
Answer Extraction
The final response is parsed from the trace. Production systems separate the thinking from the deliverable the user sees.
Verification
A second pass — model, tool, or rule-based — checks the chain's claims and arithmetic, catching the errors confident prose hides.
// anatomy
The components teams must understand
01
Reasoning Trace
Thinking made visible
The intermediate tokens between question and answer. They function as working memory: each step becomes context the model conditions on.
02
Decomposition
Divide and conquer
Splitting one hard leap into many easy steps keeps each token prediction tractable — the core mechanism behind the accuracy lift.
03
Self-Consistency
Vote of independent chains
Multiple sampled reasoning paths with majority voting — converting sampling variance into an accuracy signal.
04
Reasoning Models
CoT baked in
Models trained with RL to deliberate extensively by default, allocating thinking depth dynamically to problem difficulty.
05
Thinking Budget
Dial for depth
Configurable limits on reasoning length — the knob that trades latency and cost against coverage of hard problems.
06
Verifier
The second opinion
A separate check validating the chain. Traces make verification tractable; bare conclusions never did.
// strategic implications
What this changes for the business
01 · Capability
Reasoning unlocks workflow-grade AI
Stepwise reasoning is the difference between drafting prose and reliably executing multi-stage analysis — the gap between a writing aid and a system you can put inside a business process. Use cases written off a year ago on accuracy grounds deserve re-evaluation against reasoning-enabled models.
02 · Economics
Thinking is a cost tier
Reasoning multiplies token spend and latency per query. Route by difficulty: direct answers for routine traffic, extended thinking for the hard tail. A single thinking-budget default applied fleet-wide is money on fire in one direction or quality on fire in the other.
03 · Trust
Traces are audit surface
Visible reasoning lets humans and tools check how a conclusion was reached — with the caveat that traces are not guaranteed faithful to internal computation. Treat them as verifiable artifacts feeding review processes, not as ground truth about the model's mind.
// common misconceptions
What Chain of Thought is not
Myth
“The trace shows what the model actually computed.”
Reality
Traces are generated text — usually predictive of the answer process, but models can rationalize. Faithfulness is an open research area. Verify the steps; don't venerate them.
Myth
“More thinking always helps.”
Reality
Past task-dependent thresholds, returns flatten while cost and latency climb — and overthinking can degrade performance on simple tasks. Budget thinking to difficulty, not to maximum.
Myth
“Chain of thought is just a prompting trick.”
Reality
It became a training paradigm: RL-optimized reasoning models now anchor frontier capability, and test-time compute is a strategic scaling axis alongside parameters and data — not a hack.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.