# Chain of Thought — Sequential Reasoning Engine

> Eliciting intermediate reasoning steps before a final answer — by prompting or by training. Decomposing problems into explicit sequential steps dramatically improves accuracy on math, logic, and planning, trading more tokens and latency for more reliable conclusions.

**Canonical URL:** https://www.andekian.com/ai-lexicon/chain-of-thought  
**Author / Site:** Stephen Andekian — https://www.andekian.com

**Term 09 of 100** · Reasoning & Cognition  
**Tags:** Reasoning, Test-Time Compute, Prompting, Accuracy

## Key Stats

- **Lift — +20–50pts:** Accuracy improvement on math and multi-step reasoning benchmarks versus direct answering — among the largest known prompting effects.
- **Cost — 2–10x:** Token multiplier for reasoning traces. Dedicated reasoning models can spend thousands of thinking tokens on a single hard problem.
- **Paradigm — test-time:** CoT seeded the test-time-compute era: spending inference compute on thinking now rivals model scale as the capability lever.

## What Chain of Thought Actually Is

Chain of thought exploits a structural fact about autoregressive models: every generated token becomes context for the next. When a model jumps directly to an answer, one token prediction carries the entire cognitive load. When it reasons stepwise, each intermediate conclusion is written down and conditioned on — arithmetic stays local, logical dependencies stay explicit, and a hard leap becomes a sequence of easy ones.

What began as a prompting trick — appending “think step by step” — became a training paradigm. Modern reasoning models are trained with reinforcement learning to produce extended deliberation by default, allocating thinking tokens dynamically to problem difficulty. Techniques stack on top: self-consistency samples multiple independent chains and takes the majority answer; verification passes audit the chain's claims; tree-structured variants explore alternative paths.

The economics are a genuine trade. Reasoning multiplies token consumption and latency per query — sometimes by an order of magnitude — in exchange for accuracy on problems where direct answering fails. This created the test-time-compute axis: capability now scales not just with model size and training data, but with how much thinking a deployment is willing to buy per question. Routing by difficulty is the discipline that keeps the bill rational.

One caution deserves executive attention: the visible trace is generated text, not a guaranteed transcript of internal computation. Traces are usually predictive of how the answer was reached, and they make verification tractable in a way bare conclusions never were — but models can rationalize. Treat reasoning traces as auditable artifacts to be checked, not as ground-truth psychology.

## How It Works: How stepwise thinking changes the answer

Chain of thought lets the model condition each conclusion on explicit prior steps — converting one hard leap into many easy ones.

1. **Elicitation** — The prompt requests visible reasoning — or the model is trained to reason by default. Either way, intermediate tokens are the point, not decoration.
2. **Decomposition** — The problem is broken into sub-problems: extract the knowns, identify the goal, plan the path between them.
3. **Sequential Derivation** — Each step generates context that conditions the next — arithmetic stays local, logic stays explicit, and errors become visible instead of silent.
4. **Self-Consistency** — Multiple independent chains are sampled and the majority answer wins — trading additional compute for measurable accuracy gains.
5. **Answer Extraction** — The final response is parsed from the trace. Production systems separate the thinking from the deliverable the user sees.
6. **Verification** — A second pass — model, tool, or rule-based — checks the chain's claims and arithmetic, catching the errors confident prose hides.

## Anatomy: The Components Teams Must Understand

- **Reasoning Trace** (Thinking made visible): The intermediate tokens between question and answer. They function as working memory: each step becomes context the model conditions on.
- **Decomposition** (Divide and conquer): Splitting one hard leap into many easy steps keeps each token prediction tractable — the core mechanism behind the accuracy lift.
- **Self-Consistency** (Vote of independent chains): Multiple sampled reasoning paths with majority voting — converting sampling variance into an accuracy signal.
- **Reasoning Models** (CoT baked in): Models trained with RL to deliberate extensively by default, allocating thinking depth dynamically to problem difficulty.
- **Thinking Budget** (Dial for depth): Configurable limits on reasoning length — the knob that trades latency and cost against coverage of hard problems.
- **Verifier** (The second opinion): A separate check validating the chain. Traces make verification tractable; bare conclusions never did.

## Strategic Implications

- **Reasoning unlocks workflow-grade AI** (01 · Capability): Stepwise reasoning is the difference between drafting prose and reliably executing multi-stage analysis — the gap between a writing aid and a system you can put inside a business process. Use cases written off a year ago on accuracy grounds deserve re-evaluation against reasoning-enabled models.
- **Thinking is a cost tier** (02 · Economics): Reasoning multiplies token spend and latency per query. Route by difficulty: direct answers for routine traffic, extended thinking for the hard tail. A single thinking-budget default applied fleet-wide is money on fire in one direction or quality on fire in the other.
- **Traces are audit surface** (03 · Trust): Visible reasoning lets humans and tools check how a conclusion was reached — with the caveat that traces are not guaranteed faithful to internal computation. Treat them as verifiable artifacts feeding review processes, not as ground truth about the model's mind.

## Common Misconceptions

- **Myth:** “The trace shows what the model actually computed.”  
  **Reality:** Traces are generated text — usually predictive of the answer process, but models can rationalize. Faithfulness is an open research area. Verify the steps; don't venerate them.
- **Myth:** “More thinking always helps.”  
  **Reality:** Past task-dependent thresholds, returns flatten while cost and latency climb — and overthinking can degrade performance on simple tasks. Budget thinking to difficulty, not to maximum.
- **Myth:** “Chain of thought is just a prompting trick.”  
  **Reality:** It became a training paradigm: RL-optimized reasoning models now anchor frontier capability, and test-time compute is a strategic scaling axis alongside parameters and data — not a hack.

## Related Terms

- [LLM — Large Language Model](https://www.andekian.com/ai-lexicon/llm)
- [Agentic AI — Autonomous Workflow Execution](https://www.andekian.com/ai-lexicon/agentic-ai)
- [Prompt Engineering — Instruction Optimization](https://www.andekian.com/ai-lexicon/prompt-engineering)
- [Reflection Loop — Self-Review Mechanism](https://www.andekian.com/ai-lexicon/reflection-loop)
- [Recursive Reasoning — Multi-Pass Problem Solving](https://www.andekian.com/ai-lexicon/recursive-reasoning)
- [Chain-of-Verification — Step-By-Step Validation](https://www.andekian.com/ai-lexicon/chain-of-verification)
- [Tree of Thoughts — Branching Reasoning Framework](https://www.andekian.com/ai-lexicon/tree-of-thoughts)
- [ReAct Framework — Reasoning Plus Acting](https://www.andekian.com/ai-lexicon/react-framework)

## Explore the Full Lexicon

All 100 terms: https://www.andekian.com/ai-lexicon

## Contact

Book a conversation or send an inquiry: https://www.andekian.com/#contact
LinkedIn: https://www.linkedin.com/in/andekian/