Context Window Allocation
Where Do the Tokens Go?
A 128K-token context window seems enormous — until you account for everything an agent needs to carry.
128,000 Token Context Window
35,000 tokens consumed before the user's message
History
Retrieved Context
Available Capacity — 89,000 tokens
System Prompt
Identity, guidelines, constraints
2,000 tokens · 1.6%
Tool Definitions
Names, descriptions, schemas
3,000 tokens · 2.3%
Conversation History
Prior turns in the session
10,000 tokens · 7.8%
Retrieved Context
RAG documents, knowledge
20,000 tokens · 15.6%
Model Response
Generated output tokens
~4,000 tokens · 3.1%
Available Capacity
Remaining headroom
89,000 tokens · 69.5%
Three Dimensions
Cost
Every token has a price. Output tokens cost 3–5× more than input tokens. Costs multiply with each turn.
Input: $3–30/M tokens
Output: $15–100/M tokens
Latency
Output tokens are generated sequentially. More tokens = longer wait. 2,000-token response takes 10× longer than 200 tokens.
First token: 200–500ms
Full response: 2–15 seconds
Attention
Tokens are the unit of attention. Reasoning quality degrades as token count grows. More tokens ≠ better results.
Quality peaks: first & last
Degraded: middle positions
Attention Pattern
The "Lost in the Middle" Effect
Models attend most strongly to information at the beginning and end of the context window. Information buried in the middle receives significantly less attention — even when it's the most relevant.
← Beginning of context
End of context →
Low attention
High attention
Architectural Implication
Place critical information — system instructions, key facts, decision criteria — at the beginning or end of the context. Use retrieval strategies that surface only the most relevant chunks rather than flooding the context with marginally related content.
Multi-Agent Impact
Context Handoff: Naive vs. Compressed
Every agent-to-agent handoff serializes context into tokens. The handoff design determines whether costs compound or stay controlled.
Naive
Retrieval Agent
50K tokens
Compressed
Retrieval Agent
50K tokens