Layered Architecture
Four Layers of Agent Memory
Each layer serves a different purpose, operates at a different time scale, and competes for space in the same finite context window.
๐
System Prompt
Identity, guidelines, tools
๐ฌ
Conversation History
Turn-by-turn session continuity
๐
Retrieved Knowledge
Query-relevant RAG documents
๐ง
Persistent Memory
Cross-session entity knowledge
"Who am I and what are my rules?"
Foundation layer. Contains agent identity, behavioral guidelines, tool descriptions, and standing instructions. Changes rarely โ only when the agent is reconfigured.
"What did we just discuss?"
Most dynamic layer. Grows with each turn, may need summarization or truncation as it approaches window limits. Provides turn-by-turn continuity within the current session.
"What does the organization know about this?"
Domain-specific information drawn from knowledge bases. Populated dynamically based on the current query. Changes with each turn as the topic shifts.
"What do we know about this customer?"
Cross-session context about entities, relationships, and learned preferences. Updates only when the agent encounters information worth retaining.
Fundamental Property
Every Call Starts from Zero
The model's weights do not change between calls. No internal register updates. The illusion of memory is entirely constructed by the application layer.
API Call #1
Processes claim, learns client is risk-averse
Context loaded
API Call #2
Starts fresh โ knows nothing from Call #1
Blank slate
With Memory System
External store re-injects "risk-averse" into context
Context restored
The model never learns in real-time
Its weights do not update. Its capabilities do not improve through use. Every perceived adaptation is the product of external systems feeding better information into the same stateless engine.
Approaches
Three Memory Strategies
Production agents combine these approaches into a layered architecture, allocating context window budget dynamically based on the current task.
Conversation History
Prepend prior exchanges to each request. Works well for short, contained sessions. Breaks down as conversations grow longer.
Scope
Within session
Scale
Summarize at limits
Tradeoff
Continuity vs. granularity
RAG
Retrieve relevant chunks from external knowledge bases and inject into context. Connects agents to organizational knowledge they were never trained on.
Scope
Organizational knowledge
Scale
Scales with embeddings
Tradeoff
Precision vs. noise
External Memory
Persist agent-learned facts to structured stores. Cross-session knowledge about entities, preferences, and learned behaviors.
Scope
Cross-session
Scale
Selective persistence
Tradeoff
Retention vs. noise