The decision to build a multi-agent system is usually made too early. Someone sketches a system diagram with three or four agents, each labeled with a specialization — “Research Agent,” “Analysis Agent,” “Writing Agent” — and the architecture looks clean and modular on the whiteboard. What the diagram does not show is the coordination layer that will consume most of your engineering effort: how the agents discover each other, how they negotiate when sub-tasks overlap, what happens when one agent fails mid-process and the others have already committed side effects, and how you debug a problem that spans three separate context windows and fifty tool calls.
None of this means multi-agent systems are wrong. There are problems that genuinely require them. But the coordination overhead is real, it is substantial, and it should be a deliberate choice rather than a default assumption. This post covers when multi-agent coordination is actually justified, what coordination challenges you will face, and which patterns address them.
When Multi-Agent Systems Are Justified
There are four legitimate reasons to split work across multiple agents. If your use case does not fit one of these, you probably want a single agent with well-designed tools.
Context window exhaustion. When the accumulated context — instructions, conversation history, tool responses, intermediate reasoning — exceeds what a single agent can hold without degrading output quality, you have a structural reason to split. A research agent can process a hundred-page document and produce a focused summary that fits in a downstream analysis agent’s context. Two agents with smaller, focused contexts will outperform a single agent drowning in irrelevant information. This is the most common and most defensible reason for multi-agent design.
Genuine expertise boundaries. Some tasks require capabilities that conflict within a single instruction set. An agent instructed to be an aggressive security auditor will approach a code review differently than an agent instructed to optimize for developer productivity. Asking a single agent to be both simultaneously produces muddy, hedged output that serves neither goal. When the task requires perspectives that would create instruction-level conflicts, separate agents with separate agent instructions produce better results.
Parallel execution requirements. When latency constraints demand that independent sub-tasks run simultaneously, you need separate agents. A single agent processes tools sequentially by nature. Multiple agents can operate in parallel, each with its own reasoning loop and tool access. The previous post on orchestration patterns covers this in detail.
Isolation requirements. When different parts of a process have different security contexts — different data access permissions, different trust boundaries, different compliance requirements — separating them into distinct agents provides architectural isolation. An agent with access to customer PII should not be the same agent that writes content for public-facing channels, even if a single agent could technically handle both tasks. The isolation is not about capability; it is about blast radius.
The Discovery Problem
Before agents can coordinate, they need to find each other. In simple systems with a fixed number of agents, discovery is trivial — the agents are hardcoded and always available. But as systems scale, or when agents need to be added and removed without redeploying the entire system, discovery becomes an architectural concern.
The connection primitives from Module 1 map directly to this problem. Point-to-point connections hardcode the topology: Agent A knows Agent B’s address. Simple, fast, and completely rigid. Dynamic connections introduce a registry or catalog where agents publish their capabilities and other agents query for them at runtime. This is the model behind the MCP Registry and A2A Agent Cards — standardized ways for agents to describe what they can do so that other agents or orchestrators can discover and invoke them.
The practical challenge with dynamic discovery is match quality. When an orchestrating agent queries a registry for “an agent that can analyze financial documents,” the registry might return several matches with different capability profiles. Selecting the right one requires either explicit capability matching (specific schemas, versioned contracts) or semantic matching (LLM-based reasoning about capability descriptions). The former is reliable but rigid. The latter is flexible but introduces another probabilistic decision point into an already probabilistic system.
For most production systems, the pragmatic approach is a middle ground: a curated registry of pre-approved agents with explicit capability contracts, discovered dynamically but constrained to a known set. This is loose coupling where it matters (adding new agents does not require redeploying existing ones) without the open-ended risk of fully dynamic discovery in untrusted environments.
Negotiation and Task Handoff
Once agents can find each other, the next challenge is how work moves between them. This is where the interaction primitives become critical, and where choosing the wrong interaction pattern can undermine the entire system.
Delegation is the most common pattern for multi-agent coordination: one agent delegates a sub-task to another with explicit parameters and context. The quality of the delegation determines the quality of the result. A delegation that says “analyze this data” gives the receiving agent too little context. A delegation that includes the full upstream conversation history wastes context window space on irrelevant information. The art is providing the right context — the specific data, the analysis criteria, the output format, and any constraints — without overloading the receiving agent’s capacity.
Conversation is the right pattern when the task requires back-and-forth negotiation between agents. Two agents reviewing a document from different perspectives might need to discuss disagreements, propose compromises, and converge on a joint recommendation. This is powerful but expensive — every turn in a multi-agent conversation consumes tokens for both participants, and conversations can meander if not bounded by clear termination criteria. In production, unbounded agent-to-agent conversation is a cost and latency risk. Set explicit turn limits and escalation triggers.
Notification is appropriate when agents operate independently and need to react to events rather than requests. In choreography-based multi-agent systems, agents publish notifications about completed work or state changes, and other agents subscribe to relevant events. This is the most decoupled pattern — agents do not need to know about each other, only about the events they care about — but it trades coordination simplicity for observability complexity. When something goes wrong in an event-driven multi-agent system, reconstructing the causal chain across multiple independent agents is significantly harder than tracing a delegation chain.
Partial Failure: The Hard Problem
Single-agent failure is conceptually simple: the agent either succeeds or it does not. Multi-agent failure is a combinatorial problem. Agent A succeeds, Agent B fails, Agent C depends on both. Agent A has already committed a side effect (sent an email, updated a record), so rolling back the entire operation is not straightforward.
Partial failure is the problem that most multi-agent designs underestimate, and the one that causes the most production incidents. There are several patterns for handling it:
Compensation over rollback. Instead of trying to undo completed work (which may be impossible for side effects like sent notifications or external API calls), design compensating actions: a follow-up message that corrects the earlier one, a record update that reflects the changed outcome. This requires that each agent’s side effects are designed with compensation in mind — a practice borrowed from saga patterns in distributed systems.
Idempotent retries. When an agent fails, retrying with the same input should produce the same output without duplicating side effects. This is the idempotency principle from action tools applied at the agent level. If your agents cannot be retried safely, your multi-agent system cannot recover from partial failures without human intervention.
Checkpoint and resume. For long-running multi-agent processes, persist the state at each significant boundary. When a failure occurs, resume from the last successful checkpoint rather than restarting from scratch. This requires explicit state management — the system must know which agents have completed, what their outputs were, and what the next step should be. Workflow orchestration provides this naturally. Choreography-based systems need to build it explicitly.
Graceful degradation. Not every sub-task is equally critical. A multi-agent system that generates a comprehensive report might have a mandatory analysis agent and an optional enrichment agent. If the enrichment agent fails, the system should produce the report without enrichment rather than failing entirely. This requires that the aggregation logic distinguishes between required and optional contributions — a design decision that must be made explicitly, not discovered during an incident.
Conflict Resolution
When multiple agents work on overlapping aspects of a problem, they will occasionally produce contradictory outputs. Agent A recommends one approach; Agent B recommends the opposite. Neither is necessarily wrong — they may be reasoning from different data, different instructions, or different perspectives. The system needs a resolution strategy.
Authority hierarchy. One agent’s output takes precedence over another’s in case of conflict. This is simple and predictable but requires careful design of which agent has authority over which types of decisions. In governance contexts, system instructions might always override agent instructions, and a compliance-checking agent might have veto power over a content-generation agent.
Consensus mechanisms. Multiple agents vote or score competing options, and the system selects based on the aggregate. This works well for quality assessments (three agents independently evaluate a generated response, and the majority assessment is adopted) but poorly for binary decisions where the stakes are high.
Human escalation. When agents disagree and the consequences of the wrong choice are significant, escalate to a human decision-maker. This is not a failure of the system — it is a feature. Autonomy borders exist precisely for these moments, and the system should be designed to surface the disagreement clearly, with the reasoning from each agent, so the human can make an informed decision.
Coordination Cost: What You Are Actually Paying
Multi-agent coordination is not free. Every inter-agent communication consumes tokens for both the sender (formulating the delegation or message) and the receiver (processing the incoming context). Every coordination decision by an orchestrator consumes tokens for reasoning. Every retry after a partial failure doubles the token cost of the failed operation.
In practice, a well-designed multi-agent system typically consumes three to ten times the tokens of a single-agent approach for the same task. This range is wide because it depends on the depth of coordination, the number of retries, and whether the system uses conversation-based negotiation (expensive) or delegation with structured handoffs (cheaper).
The latency picture is mixed. Parallel execution reduces wall-clock time. But sequential coordination chains, orchestrator reasoning steps, and retry loops add latency that a single agent would not incur. The net effect depends on the specific pattern and workload.
The debugging cost is the most commonly underestimated factor. A multi-agent system requires distributed tracing that correlates actions across agents, links tool calls to the delegations that triggered them, and reconstructs the decision chain that led to a particular output. Without this infrastructure, every production incident becomes an archaeological expedition.
Key Takeaways
Multi-agent systems are justified by four specific constraints — context window exhaustion, expertise boundaries, parallel execution requirements, and security isolation — and if your use case does not clearly fit one of these, a single agent with good tools is almost certainly the better choice. The coordination challenges that multi-agent systems introduce — discovery, negotiation, partial failure, and conflict resolution — are not theoretical complications but the primary engineering problems you will spend your time solving, and each requires deliberate architectural patterns rather than ad-hoc solutions. The total cost of multi-agent coordination — measured in tokens, latency, debugging complexity, and engineering effort — is consistently higher than most teams estimate at design time, which is why the decision to go multi-agent should be driven by a specific, articulable limitation of simpler approaches rather than by an assumption that more agents means better architecture.