Token Budget & Context Window

Context Window Allocation

Where Do the Tokens Go?

A 128K-token context window seems enormous — until you account for everything an agent needs to carry.

128,000 Token Context Window 35,000 tokens consumed before the user's message

History

Retrieved Context

Available Capacity — 89,000 tokens

System Prompt Identity, guidelines, constraints

2,000 tokens · 1.6%

Tool Definitions Names, descriptions, schemas

3,000 tokens · 2.3%

Conversation History Prior turns in the session

10,000 tokens · 7.8%

Retrieved Context RAG documents, knowledge

20,000 tokens · 15.6%

Model Response Generated output tokens

~4,000 tokens · 3.1%

Available Capacity Remaining headroom

89,000 tokens · 69.5%

Three Dimensions

Cost

Every token has a price. Output tokens cost 3–5× more than input tokens. Costs multiply with each turn.

Input: $3–30/M tokens
Output: $15–100/M tokens

Latency

Output tokens are generated sequentially. More tokens = longer wait. 2,000-token response takes 10× longer than 200 tokens.

First token: 200–500ms
Full response: 2–15 seconds

Attention

Tokens are the unit of attention. Reasoning quality degrades as token count grows. More tokens ≠ better results.

Quality peaks: first & last
Degraded: middle positions

Attention Pattern

The "Lost in the Middle" Effect

Models attend most strongly to information at the beginning and end of the context window. Information buried in the middle receives significantly less attention — even when it's the most relevant.

← Beginning of context End of context →

Low attention

High attention

Architectural Implication

Place critical information — system instructions, key facts, decision criteria — at the beginning or end of the context. Use retrieval strategies that surface only the most relevant chunks rather than flooding the context with marginally related content.

Multi-Agent Impact

Context Handoff: Naive vs. Compressed

Every agent-to-agent handoff serializes context into tokens. The handoff design determines whether costs compound or stay controlled.

Naive

Retrieval Agent

50K tokens

Analysis Agent

50K + 30K

Summary Agent

80K + 20K

Total

230K

Compressed

Retrieval Agent

50K tokens

Analysis Agent

10K + 30K

Summary Agent

8K + 20K

Total

118K