// term 74 · Agentic Systems
AI Agent
Autonomous AI Operator
An AI system that perceives its environment, makes decisions, and takes goal-directed actions — using tools, APIs, and multi-step reasoning to complete open-ended work. The agent is the unit of AI labor: not a model answering questions, but a system pursuing outcomes.
// Anatomy
model + harness
An LLM provides the judgment; orchestration, tools, memory, and guardrails provide everything else an operator needs.
// Span
minutes–days
Task horizons from single workflows to standing responsibilities — far beyond the single-response interactions that preceded them.
// Reliability law
compounding
Per-step success rates compound across steps — 95% per action is 36% across twenty, which is why agent engineering is reliability engineering.
// full definition
What AI Agent actually is
An agent differs from a chatbot in kind, not degree: it is built to pursue outcomes rather than produce responses. Given a goal — resolve this incident, prepare this analysis, manage this queue — the agent decides what to do, does it, observes what happened, and continues until done or blocked. The language model supplies the per-step judgment; the surrounding system — orchestrator, tools, memory, permissions — turns judgment into an operator capable of touching real systems.
The loop is the architecture. Perceive: the agent assembles its picture from the goal, its memory, and fresh observations. Decide: the model selects the next action against the plan. Act: a tool call executes — a query, an API invocation, a code run, a message. Observe: results return as new information, errors included, feeding the next cycle. This grounding in real feedback is the agent's defining strength: plans collide with reality every step, and reality wins — the agent adjusts rather than narrates.
The engineering discipline is dominated by one piece of arithmetic: errors compound. A 95% per-step success rate yields 36% across twenty steps — which is why production agents are wrapped in reliability machinery: validation between steps, bounded retries, checkpoints that preserve progress, escalation paths when confidence drops, and permission scopes that cap the blast radius of any single wrong action. Capability gets the demos; reliability engineering gets the deployments.
Strategically, agents change the unit of AI value from answers to outcomes — priced per completed task and benchmarked against the fully loaded cost of the process they absorb. They also change the management problem: agents occupy roles, with responsibilities, permissions, performance reviews (evaluations), and audit trails. The organizations adopting them well treat agent design as workflow design — decomposing processes, placing human gates at consequence boundaries, and measuring task completion the way they would for any operator, silicon or otherwise.
// how it works
From goal to completed work
An agent runs a continuous loop — perceive, decide, act, observe — with a model supplying judgment and a harness supplying hands, memory, and limits.
Goal & Scope
The agent receives its objective, constraints, permissions, and escalation rules — the contract under which it operates.
Perception
Context assembles — task state, memory, fresh observations — the picture from which the next decision is made.
Decision
The model selects the next action against plan and feedback — judgment applied one step at a time.
Action
A tool executes the choice — query, API call, code run, message — judgment becoming effect in a real system.
Observation
Results and errors return as information — reality's feedback grounding the loop's next iteration.
Completion or Escalation
Done, blocked, or out of bounds — the agent delivers with its audit trail, or hands up to a human with context intact.
// anatomy
The components teams must understand
01
Reasoning Core
The judgment engine
The LLM making per-step decisions — its quality setting the ceiling on task complexity the agent can navigate.
02
Tool Interface
Hands on systems
The defined action set — search, databases, APIs, code execution — through which decisions become effects.
03
Memory & State
Continuity machinery
Working context, task state, and durable memory — what keeps long-running work coherent across steps and sessions.
04
Orchestrator
The loop runner
The harness sequencing perceive-decide-act-observe — retries, timeouts, checkpoints, and the discipline of the cycle.
05
Permission Envelope
Scoped authority
What the agent may do unsupervised, what needs approval, what's forbidden — autonomy bounded by policy in code.
06
Audit Trail
The accountability record
Every decision, action, and observation logged — the artifact that makes delegated work reviewable and defensible.
// strategic implications
What this changes for the business
01 · Value
Agents are priced per outcome
The economic unit shifts from tokens to completed tasks — benchmarked against the fully loaded cost of the process absorbed, error handling included. That math justifies far more than chat ever did, and it demands honest measurement of completion rates, not demo reels.
02 · Reliability
Compounding errors are the engineering problem
Per-step success compounds brutally across long tasks — production agents are mostly reliability machinery around a reasoning core. Evaluate agents on multi-step completion rates under realistic conditions; single-step accuracy flatters every system.
03 · Management
Agents occupy roles, not features
Permissions, responsibilities, evaluations, and audit trails — the management apparatus of delegation applies in full. Define the role before deploying the agent: what it owns, where humans gate, and who answers for its mistakes.
// common misconceptions
What AI Agent is not
Myth
“An agent is a chatbot with plugins.”
Reality
The loop changes the category: goal pursuit, real actions, compounding consequences, and accountability needs that no response-generator carries. The engineering and governance are different disciplines, not extensions.
Myth
“Better models will make agent harnesses unnecessary.”
Reality
Stronger reasoning raises per-step quality; compounding still demands validation, checkpoints, permissions, and audit. The harness is where reliability and accountability live — model progress shifts its emphasis, not its necessity.
Myth
“Agents either work or they don't.”
Reality
Agent performance is a distribution over tasks and conditions — completion rates, escalation rates, error severities. Deployment readiness is a measured threshold per use case, not a binary the demo settles.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.