Module 0 ยท The AI Substrate

Memory Architecture Stack

LLMs are stateless by default โ€” every call starts from zero. Everything an agent "remembers" is the result of deliberate engineering: information explicitly placed into the context window at inference time.

Layered Architecture
Four Layers of Agent Memory
Each layer serves a different purpose, operates at a different time scale, and competes for space in the same finite context window.
Context Window ยท 128K Tokens
๐Ÿ—
System Prompt
Identity, guidelines, tools
๐Ÿ’ฌ
Conversation History
Turn-by-turn session continuity
๐Ÿ”
Retrieved Knowledge
Query-relevant RAG documents
๐Ÿง 
Persistent Memory
Cross-session entity knowledge
System Prompt
Static
"Who am I and what are my rules?"
Foundation layer. Contains agent identity, behavioral guidelines, tool descriptions, and standing instructions. Changes rarely โ€” only when the agent is reconfigured.
Conversation History
Dynamic
"What did we just discuss?"
Most dynamic layer. Grows with each turn, may need summarization or truncation as it approaches window limits. Provides turn-by-turn continuity within the current session.
Retrieved Knowledge (RAG)
Query-dependent
"What does the organization know about this?"
Domain-specific information drawn from knowledge bases. Populated dynamically based on the current query. Changes with each turn as the topic shifts.
Persistent Memory
Slow-moving
"What do we know about this customer?"
Cross-session context about entities, relationships, and learned preferences. Updates only when the agent encounters information worth retaining.
Fundamental Property
Every Call Starts from Zero
The model's weights do not change between calls. No internal register updates. The illusion of memory is entirely constructed by the application layer.
API Call #1
Processes claim, learns client is risk-averse
Context loaded
State reset
API Call #2
Starts fresh โ€” knows nothing from Call #1
Blank slate
With Memory System
External store re-injects "risk-averse" into context
Context restored

The model never learns in real-time

Its weights do not update. Its capabilities do not improve through use. Every perceived adaptation is the product of external systems feeding better information into the same stateless engine.

Approaches
Three Memory Strategies
Production agents combine these approaches into a layered architecture, allocating context window budget dynamically based on the current task.

Conversation History

Prepend prior exchanges to each request. Works well for short, contained sessions. Breaks down as conversations grow longer.

Scope Within session
Scale Summarize at limits
Tradeoff Continuity vs. granularity

RAG

Retrieve relevant chunks from external knowledge bases and inject into context. Connects agents to organizational knowledge they were never trained on.

Scope Organizational knowledge
Scale Scales with embeddings
Tradeoff Precision vs. noise

External Memory

Persist agent-learned facts to structured stores. Cross-session knowledge about entities, preferences, and learned behaviors.

Scope Cross-session
Scale Selective persistence
Tradeoff Retention vs. noise