---
title: The Context Window Problem Is the Client's Fault
date: 2026-04-15
topics: protocols-standards
tags: mcp, context-window, tool-use, agent-design
author: markus-muller
type: Opinion/Op-Ed
reading_time: 7 minutes
word_count: 1484
url: https://agentic-academy.ai/posts/mcp-context-window-client-problem/
---


## Summary

Blaming MCP for context window pollution targets the wrong layer. Smart host applications can solve this — but only if the spec gives them enough metadata to work with.



Every few weeks the same complaint resurfaces in the MCP community: tool descriptions are filling up context windows, token budgets are exploding, agents with large tool sets become slow and expensive. The proposed fix is almost always the same — keep the spec lean, resist adding metadata, don't let MCP grow into something verbose.

I think this is the wrong prescription for the wrong diagnosis. And I want to make the case that it's actively counterproductive.

## Where the tokens are actually coming from

A brief word on terminology, because MCP's own spec is precise about this. The **host** is the LLM application — Claude Desktop, your custom agent framework, the IDE integration. The **client** is a connector component embedded inside the host that manages a 1:1 connection to one MCP server. I'm going to use "host application" for the entity that makes decisions, because that's what the spec calls it — and the distinction matters for the argument.

The MCP protocol does not specify what a host application sends to an LLM. It specifies how MCP clients communicate with tool servers — the JSON-RPC handshake, the `tools/list` response, the `tools/call` mechanics. What the host does with that information afterward is entirely outside the protocol's scope. The spec says this directly: MCP "does not dictate how AI applications use LLMs or manage the provided context."

The context window problem is not that MCP returns tool descriptions. It is that naive host applications take every tool description they receive from every connected server and inject all of it, verbatim, into the LLM's system prompt on every request. An application with fifty tool connections dumps fifty tool definitions into context before the user even finishes typing. The LLM pays the token cost whether it needs any of those tools or not.

That is a host application implementation decision, not a protocol constraint. MCP has nothing to say about it. The spec could be twice as rich and this problem would be exactly as bad, or exactly as solvable, depending entirely on how the host application is built.

## What a smart host application can do

The host application sits between the MCP servers and the LLM. It holds the complete picture: every tool that's available, every description, every input schema, every annotation. The LLM only sees what the host chooses to send.

This asymmetry is the opportunity. A host can implement progressive discovery rather than upfront flooding — and the gap between a naive host and a smart one is enormous.

Consider the simplest version: instead of injecting all tool descriptions at once, the host could inject only the names and one-line summaries, let the model identify which tools are relevant to the current task, then fetch and inject full definitions on demand for the tools that will actually be called. Token usage for irrelevant tools: zero.

Go one step further and implement semantic pre-filtering. Before touching the LLM context at all, the host embeds the user's request and scores it against pre-computed embeddings of tool descriptions. Only tools above a similarity threshold get surfaced. The model never pays to reason about tools that don't match the task.

Or structure discovery hierarchically: expose tool categories to the model first, let it select a category, then load the tools within it. An agent working on an order management task never needs to see the definitions of the document processing tools, the analytics tools, or the authentication utilities. They exist in the host's registry. They don't need to exist in the LLM's context.

This is not a thought experiment. Anthropic has already shipped exactly this pattern. Claude Code enables deferred tool loading for MCP connections by default: tool definitions are not injected into context upfront. When Claude needs a tool, the host searches the tool catalog — using either regex or BM25 natural language matching against tool names, descriptions, and argument descriptions — and loads only the matching definitions into context. Anthropic reports roughly an 85% reduction in token usage for tool definitions as a result: from around 77,000 tokens down to around 8,700 for a large tool set. The same mechanism is available in the API via the Tool Search Tool and a `defer_loading: true` flag on individual tool definitions. The problem has a host-layer solution, and one of the most widely used MCP clients is shipping it.

{{< interactive src="/visualizations/mcp-client-context.html" title="Naive vs. Smart MCP Client" caption="How the same set of MCP tools can either flood the context window or be filtered intelligently depending on client design." height="auto" fullwidth="true" >}}

Each of these strategies is a host application problem with a host application solution. None of them require changes to the MCP protocol. But all of them require the host to have information it can act on.

## The metadata problem

Here's where the "keep it minimal" instinct breaks down.

Progressive discovery requires grouping signals. If the spec doesn't define a way for tool servers to express categories, domains, or capability tags, the client has to infer them from free-text descriptions using heuristics that will be wrong in exactly the cases that matter most.

Semantic filtering requires rich descriptions. Look at what Anthropic's BM25 tool search actually scores against: tool names, descriptions, argument names, and argument descriptions. Strip those fields down to terse, minimal text and you degrade the signal the search algorithm runs on. The scores that should route an order-management query to the right three tools out of fifty become noisier. The host compensates by either lowering the threshold (more false positives, more tokens) or raising it (missed tools, broken tasks). The 85% token reduction Anthropic achieved depends entirely on the BM25 index having enough descriptive content to match queries accurately. Minimise the spec and you erode the very thing that makes deferred loading work.

Risk-based filtering can use the tool annotations that already exist in the MCP spec — `readOnlyHint`, `destructiveHint`, `idempotentHint`. The spec is explicit that these are hints provided by the server, not guarantees, and that host applications should treat them with appropriate scepticism from untrusted servers. But for tools within your own infrastructure, or from vendors you've vetted, a smart host can act on them: surface read-only tools broadly, require explicit task context before loading destructive ones. These annotations are exactly the kind of signal that makes intelligent filtering possible. And they're already in the spec — which is precisely the argument for having put them there.

The pattern is consistent. Every host-side intelligence strategy depends on the host knowing more about tools than just their name and a terse description. Categories, annotations, semantic richness, usage hints, relationships between tools — this is the information that allows a host to make decisions on behalf of the LLM rather than outsourcing all the reasoning to the model at token cost.

When the spec is stripped in the name of minimalism, the host goes blind. It can't filter what it can't classify. It can't route what it can't characterize. The fallback is to dump everything and let the model figure it out — which is exactly the behaviour the "keep it minimal" camp is trying to prevent.

## The distinction that actually matters

There are two very different questions being conflated in this debate. The first is what information should flow from a tool server to an MCP client. The second is what a host application does with that information before any of it reaches the LLM.

The answer to the second question should be: as little as possible, chosen intelligently. The answer to the first question should be: as much as is useful, because a host can't make intelligent choices from a position of ignorance.

A richer MCP spec doesn't mean richer LLM contexts. It means host applications have more signal to work with when deciding what to include. The transport between server and client is cheap — it happens once at session start or on a `tools/list` call, not on every inference. The context window is expensive — every token costs at every call.

The way to keep LLM context lean is to build host applications that are smart enough to be selective. The way to build host applications that are smart enough to be selective is to give them rich, structured, reliable metadata to reason from. The way to ensure that metadata exists is to have a spec that defines and encourages it — not one that treats every field as a liability.

Trying to solve a host application problem by impoverishing the protocol is working backwards. The spec should give host applications everything they need to do their job well. What they do with that information — how carefully they curate what the model sees — is where the real work is.

---

*Have a different take on how MCP host applications should handle tool discovery? The debate is worth having publicly — the implementation choices made in the next eighteen months will shape how agentic systems scale.*


---

*This content is from Agentic Academy (https://agentic-academy.ai/)*
*Published: April 15, 2026*
