# Short-Term Memory — Active Session Awareness

> The information active within the current context window — the conversation so far, retrieved documents, task state, and instructions the model can actually see right now. Short-term memory is the AI's working present: everything it reasons over, bounded by tokens and gone when the session ends.

**Canonical URL:** https://www.andekian.com/ai-lexicon/short-term-memory  
**Author / Site:** Stephen Andekian — https://www.andekian.com

**Term 69 of 100** · Memory & Context  
**Tags:** Working Memory, Context Window, Session State, Budget

## Key Stats

- **Scope — the window:** Short-term memory and the context window are the same resource — bounded, billed, and reset per session.
- **Behavior — evicts:** Long sessions overflow — without management, the earliest turns silently fall out, taking their decisions with them.
- **Discipline — curation:** What occupies the window is chosen, not accumulated — context management is the operating system of the session.

## What Short-Term Memory Actually Is

Everything an AI system can currently think about lives in one place: the context window. The conversation so far, the documents retrieved, the system instructions, the task state — if it's not in the window, it does not exist for this request. Short-term memory names this working present, and its defining properties are hard ones: strictly bounded in size, billed by the token, attention-weighted unevenly across its span, and erased completely when the session ends.

Left unmanaged, the working present degrades predictably. Conversations accumulate until the window overflows, and the earliest turns — often containing the constraints and decisions that framed everything after — silently evict. Long contexts dilute attention, with mid-window content recalled least reliably. And every turn re-transmits the whole accumulated state, so an unmanaged session's cost grows quadratically with its length. The window is a scarce resource behaving exactly like one.

Session memory management is therefore active curation. Recent turns stay verbatim; aging history compresses into rolling summaries; settled subtasks collapse to their conclusions; retrieved documents enter when relevant and leave when spent. Critical state — the task definition, key constraints, standing decisions — gets pinned where attention is strongest rather than left to drift mid-window. The same discipline governs agents, whose working memory must track plans, tool results, and intermediate findings across dozens of steps without drowning in their accumulation.

The architectural pairing is with long-term memory: the window as working present, durable storage as accumulated past, retrieval as the bridge. Systems that blur the tiers fail in both directions — sessions bloated with history that storage should hold, or assistants amnesiac about facts the last session established. The design question for every piece of information is which tier it belongs to right now: in the window earning its tokens, in storage awaiting relevance, or nowhere at all.

## How It Works: Managing the working present

Short-term memory is curated in real time — what enters, what stays, what compresses, and what falls out as the window fills.

1. **Session Start** — The window initializes — system instructions, pinned state, and retrieved long-term memory composing the opening present.
2. **Turn Accumulation** — Each exchange appends to the working state — the present growing, and the budget depleting, turn by turn.
3. **Relevance Injection** — Retrieved documents and tool results enter as tasks need them — working memory fed just-in-time, not pre-loaded.
4. **Compression Pass** — Aging history condenses — rolling summaries and settled-task collapse reclaiming tokens from the past for the present.
5. **Eviction & Pinning** — The spent leaves, the critical pins — curation deciding what the model keeps seeing as the window pressures.
6. **Session End** — The working present vanishes — anything worth keeping must consolidate to long-term storage before the lights go out.

## Anatomy: The Components Teams Must Understand

- **Window Budget** (The bounding box): Fixed token capacity shared by instructions, history, retrievals, and the response — every occupant displacing another.
- **Attention Gradient** (Uneven presence): Start and end of context recalled best, the middle least — position within the window as a reliability variable.
- **Rolling Summary** (The compressed past): Aging turns condensed to decisions and facts — session history surviving as digest rather than transcript.
- **Pinned State** (The non-negotiables): Task definitions, constraints, and standing decisions held in high-attention positions — protected from drift and eviction.
- **Agent Scratchpad** (Working state for workers): Plans, tool results, and intermediate findings tracked across steps — the agent variant of session memory, under the same budget.
- **Consolidation Exit** (The handoff to durable): End-of-session extraction to long-term storage — what survives the reset, chosen before the reset happens.

## Strategic Implications

- **Session degradation is memory mismanagement** (01 · Quality): Assistants that forget their instructions mid-conversation, lose early decisions, or drift off constraints are exhibiting window overflow and attention dilution — engineering failures with engineering fixes. Context curation, not model upgrades, is usually the remedy for long-session unreliability.
- **The working present is re-billed every turn** (02 · Economics): Each request retransmits the accumulated window — unmanaged sessions grow quadratically expensive with length. Summarization, eviction, and prompt caching are direct cost controls on every conversational and agentic product at volume.
- **Tier the memory deliberately** (03 · Architecture): Window for the working present, durable storage for the accumulated past, retrieval as the bridge — systems that blur the tiers pay in bloat or amnesia. Every piece of state deserves an explicit answer to where it lives right now.

## Common Misconceptions

- **Myth:** “The assistant forgot — the model must be bad.”  
  **Reality:** Mid-session forgetting is usually window overflow or mid-context attention dilution — the information evicted or buried, not mis-reasoned. Diagnose the context management before indicting the model.
- **Myth:** “Filling the window with everything relevant is safest.”  
  **Reality:** Bloated context dilutes attention, buries the critical mid-window, and multiplies cost — curation outperforms accumulation. The strongest sessions carry the least dead weight.
- **Myth:** “Session memory and long-term memory are one feature.”  
  **Reality:** They are different tiers with different lifespans, costs, and governance — working present versus accumulated past. Conflating them produces systems that are simultaneously bloated and forgetful.

## Related Terms

- [LLM — Large Language Model](https://www.andekian.com/ai-lexicon/llm)
- [Context Window — Operational Memory Limit](https://www.andekian.com/ai-lexicon/context-window)
- [Prompt Engineering — Instruction Optimization](https://www.andekian.com/ai-lexicon/prompt-engineering)
- [Context Injection — Dynamic Information Insertion](https://www.andekian.com/ai-lexicon/context-injection)
- [Context Compression — Smaller Context Footprint](https://www.andekian.com/ai-lexicon/context-compression)
- [Memory Persistence — Retained AI State](https://www.andekian.com/ai-lexicon/memory-persistence)
- [Long-Term Memory — Persistent Contextual Storage](https://www.andekian.com/ai-lexicon/long-term-memory)
- [AI Agent — Autonomous AI Operator](https://www.andekian.com/ai-lexicon/ai-agent)

## Explore the Full Lexicon

All 100 terms: https://www.andekian.com/ai-lexicon

## Contact

Book a conversation or send an inquiry: https://www.andekian.com/#contact
LinkedIn: https://www.linkedin.com/in/andekian/