// term 69 · Memory & Context

Short-Term Memory

Active Session Awareness

The information active within the current context window — the conversation so far, retrieved documents, task state, and instructions the model can actually see right now. Short-term memory is the AI's working present: everything it reasons over, bounded by tokens and gone when the session ends.

Working MemoryContext WindowSession StateBudget

// Scope

the window

Short-term memory and the context window are the same resource — bounded, billed, and reset per session.

// Behavior

evicts

Long sessions overflow — without management, the earliest turns silently fall out, taking their decisions with them.

// Discipline

curation

What occupies the window is chosen, not accumulated — context management is the operating system of the session.

// full definition

What Short-Term Memory actually is

Everything an AI system can currently think about lives in one place: the context window. The conversation so far, the documents retrieved, the system instructions, the task state — if it's not in the window, it does not exist for this request. Short-term memory names this working present, and its defining properties are hard ones: strictly bounded in size, billed by the token, attention-weighted unevenly across its span, and erased completely when the session ends.

Left unmanaged, the working present degrades predictably. Conversations accumulate until the window overflows, and the earliest turns — often containing the constraints and decisions that framed everything after — silently evict. Long contexts dilute attention, with mid-window content recalled least reliably. And every turn re-transmits the whole accumulated state, so an unmanaged session's cost grows quadratically with its length. The window is a scarce resource behaving exactly like one.

Session memory management is therefore active curation. Recent turns stay verbatim; aging history compresses into rolling summaries; settled subtasks collapse to their conclusions; retrieved documents enter when relevant and leave when spent. Critical state — the task definition, key constraints, standing decisions — gets pinned where attention is strongest rather than left to drift mid-window. The same discipline governs agents, whose working memory must track plans, tool results, and intermediate findings across dozens of steps without drowning in their accumulation.

The architectural pairing is with long-term memory: the window as working present, durable storage as accumulated past, retrieval as the bridge. Systems that blur the tiers fail in both directions — sessions bloated with history that storage should hold, or assistants amnesiac about facts the last session established. The design question for every piece of information is which tier it belongs to right now: in the window earning its tokens, in storage awaiting relevance, or nowhere at all.

// how it works

Managing the working present

Short-term memory is curated in real time — what enters, what stays, what compresses, and what falls out as the window fills.

01

Session Start

The window initializes — system instructions, pinned state, and retrieved long-term memory composing the opening present.

02

Turn Accumulation

Each exchange appends to the working state — the present growing, and the budget depleting, turn by turn.

03

Relevance Injection

Retrieved documents and tool results enter as tasks need them — working memory fed just-in-time, not pre-loaded.

04

Compression Pass

Aging history condenses — rolling summaries and settled-task collapse reclaiming tokens from the past for the present.

05

Eviction & Pinning

The spent leaves, the critical pins — curation deciding what the model keeps seeing as the window pressures.

06

Session End

The working present vanishes — anything worth keeping must consolidate to long-term storage before the lights go out.

// anatomy

The components teams must understand

01

Window Budget

The bounding box

Fixed token capacity shared by instructions, history, retrievals, and the response — every occupant displacing another.

02

Attention Gradient

Uneven presence

Start and end of context recalled best, the middle least — position within the window as a reliability variable.

03

Rolling Summary

The compressed past

Aging turns condensed to decisions and facts — session history surviving as digest rather than transcript.

04

Pinned State

The non-negotiables

Task definitions, constraints, and standing decisions held in high-attention positions — protected from drift and eviction.

05

Agent Scratchpad

Working state for workers

Plans, tool results, and intermediate findings tracked across steps — the agent variant of session memory, under the same budget.

06

Consolidation Exit

The handoff to durable

End-of-session extraction to long-term storage — what survives the reset, chosen before the reset happens.

// strategic implications

What this changes for the business

01 · Quality

Session degradation is memory mismanagement

Assistants that forget their instructions mid-conversation, lose early decisions, or drift off constraints are exhibiting window overflow and attention dilution — engineering failures with engineering fixes. Context curation, not model upgrades, is usually the remedy for long-session unreliability.

02 · Economics

The working present is re-billed every turn

Each request retransmits the accumulated window — unmanaged sessions grow quadratically expensive with length. Summarization, eviction, and prompt caching are direct cost controls on every conversational and agentic product at volume.

03 · Architecture

Tier the memory deliberately

Window for the working present, durable storage for the accumulated past, retrieval as the bridge — systems that blur the tiers pay in bloat or amnesia. Every piece of state deserves an explicit answer to where it lives right now.

// common misconceptions

What Short-Term Memory is not

Myth

“The assistant forgot — the model must be bad.”

Reality

Mid-session forgetting is usually window overflow or mid-context attention dilution — the information evicted or buried, not mis-reasoned. Diagnose the context management before indicting the model.

Myth

“Filling the window with everything relevant is safest.”

Reality

Bloated context dilutes attention, buries the critical mid-window, and multiplies cost — curation outperforms accumulation. The strongest sessions carry the least dead weight.

Myth

“Session memory and long-term memory are one feature.”

Reality

They are different tiers with different lifespans, costs, and governance — working present versus accumulated past. Conflating them produces systems that are simultaneously bloated and forgetful.

// from literacy to leverage

Know the term. Now build the strategy.

Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.

AI innovation, applied
Andekian

AI-first digital transformation for enterprise growth. Strategy and execution, under one operator.

© 2026 Stephen Andekian.