// term 69 · Memory & Context
Short-Term Memory
Active Session Awareness
The information active within the current context window — the conversation so far, retrieved documents, task state, and instructions the model can actually see right now. Short-term memory is the AI's working present: everything it reasons over, bounded by tokens and gone when the session ends.
// Scope
the window
Short-term memory and the context window are the same resource — bounded, billed, and reset per session.
// Behavior
evicts
Long sessions overflow — without management, the earliest turns silently fall out, taking their decisions with them.
// Discipline
curation
What occupies the window is chosen, not accumulated — context management is the operating system of the session.
// full definition
What Short-Term Memory actually is
Everything an AI system can currently think about lives in one place: the context window. The conversation so far, the documents retrieved, the system instructions, the task state — if it's not in the window, it does not exist for this request. Short-term memory names this working present, and its defining properties are hard ones: strictly bounded in size, billed by the token, attention-weighted unevenly across its span, and erased completely when the session ends.
Left unmanaged, the working present degrades predictably. Conversations accumulate until the window overflows, and the earliest turns — often containing the constraints and decisions that framed everything after — silently evict. Long contexts dilute attention, with mid-window content recalled least reliably. And every turn re-transmits the whole accumulated state, so an unmanaged session's cost grows quadratically with its length. The window is a scarce resource behaving exactly like one.
Session memory management is therefore active curation. Recent turns stay verbatim; aging history compresses into rolling summaries; settled subtasks collapse to their conclusions; retrieved documents enter when relevant and leave when spent. Critical state — the task definition, key constraints, standing decisions — gets pinned where attention is strongest rather than left to drift mid-window. The same discipline governs agents, whose working memory must track plans, tool results, and intermediate findings across dozens of steps without drowning in their accumulation.
The architectural pairing is with long-term memory: the window as working present, durable storage as accumulated past, retrieval as the bridge. Systems that blur the tiers fail in both directions — sessions bloated with history that storage should hold, or assistants amnesiac about facts the last session established. The design question for every piece of information is which tier it belongs to right now: in the window earning its tokens, in storage awaiting relevance, or nowhere at all.
// how it works
Managing the working present
Short-term memory is curated in real time — what enters, what stays, what compresses, and what falls out as the window fills.
Session Start
The window initializes — system instructions, pinned state, and retrieved long-term memory composing the opening present.
Turn Accumulation
Each exchange appends to the working state — the present growing, and the budget depleting, turn by turn.
Relevance Injection
Retrieved documents and tool results enter as tasks need them — working memory fed just-in-time, not pre-loaded.
Compression Pass
Aging history condenses — rolling summaries and settled-task collapse reclaiming tokens from the past for the present.
Eviction & Pinning
The spent leaves, the critical pins — curation deciding what the model keeps seeing as the window pressures.
Session End
The working present vanishes — anything worth keeping must consolidate to long-term storage before the lights go out.
// anatomy
The components teams must understand
01
Window Budget
The bounding box
Fixed token capacity shared by instructions, history, retrievals, and the response — every occupant displacing another.
02
Attention Gradient
Uneven presence
Start and end of context recalled best, the middle least — position within the window as a reliability variable.
03
Rolling Summary
The compressed past
Aging turns condensed to decisions and facts — session history surviving as digest rather than transcript.
04
Pinned State
The non-negotiables
Task definitions, constraints, and standing decisions held in high-attention positions — protected from drift and eviction.
05
Agent Scratchpad
Working state for workers
Plans, tool results, and intermediate findings tracked across steps — the agent variant of session memory, under the same budget.
06
Consolidation Exit
The handoff to durable
End-of-session extraction to long-term storage — what survives the reset, chosen before the reset happens.
// strategic implications
What this changes for the business
01 · Quality
Session degradation is memory mismanagement
Assistants that forget their instructions mid-conversation, lose early decisions, or drift off constraints are exhibiting window overflow and attention dilution — engineering failures with engineering fixes. Context curation, not model upgrades, is usually the remedy for long-session unreliability.
02 · Economics
The working present is re-billed every turn
Each request retransmits the accumulated window — unmanaged sessions grow quadratically expensive with length. Summarization, eviction, and prompt caching are direct cost controls on every conversational and agentic product at volume.
03 · Architecture
Tier the memory deliberately
Window for the working present, durable storage for the accumulated past, retrieval as the bridge — systems that blur the tiers pay in bloat or amnesia. Every piece of state deserves an explicit answer to where it lives right now.
// common misconceptions
What Short-Term Memory is not
Myth
“The assistant forgot — the model must be bad.”
Reality
Mid-session forgetting is usually window overflow or mid-context attention dilution — the information evicted or buried, not mis-reasoned. Diagnose the context management before indicting the model.
Myth
“Filling the window with everything relevant is safest.”
Reality
Bloated context dilutes attention, buries the critical mid-window, and multiplies cost — curation outperforms accumulation. The strongest sessions carry the least dead weight.
Myth
“Session memory and long-term memory are one feature.”
Reality
They are different tiers with different lifespans, costs, and governance — working present versus accumulated past. Conflating them produces systems that are simultaneously bloated and forgetful.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.