A language model that can only generate text is a chatbot. A language model that can call functions, produce structured data, and interact with external systems is an agent building block. The distance between those two things is smaller than most people assume, but the implications are enormous. Understanding exactly how a model crosses from free-form text generation to reliable, constrained action is the single most important prerequisite for anyone designing agentic systems. Everything else in agent architecture—coordination patterns, orchestration frameworks, governance models—depends on this foundational capability working correctly and predictably.
The shift didn’t happen overnight. Early integrations between language models and external systems relied on fragile prompt engineering: you’d ask the model to output something that looked like a function call, parse it with regular expressions, and hope for the best. That approach worked in demos. It collapsed in production. What changed was the introduction of native mechanisms—tool use, function calling, and structured output—that give models a disciplined, schema-driven way to interact with the outside world. These mechanisms are what make the jump from “interesting technology” to “enterprise building block” possible.
Tool Use: Extending the Model’s Reach
Tool use is the general concept of giving a language model access to external capabilities it can invoke during a conversation. Instead of generating an answer from its training data alone, the model can decide that a question requires real-time information, a calculation, or an interaction with an external system—and request that the host application execute that action on its behalf.
The mechanics follow a consistent pattern across providers. When you configure an API call to a model, you include a set of tool definitions alongside the conversation messages. Each definition specifies a name, a natural-language description of what the tool does, and a JSON Schema describing its expected inputs. The model reads these definitions as part of its context and, when it determines that a tool would help answer the user’s request, it emits a structured tool-use request instead of (or in addition to) a text response. The host application intercepts that request, executes the actual operation, and feeds the result back to the model, which then incorporates it into its final response.
This design is deliberately model-agnostic on the execution side. The model never directly calls an API, queries a database, or writes to a file system. It expresses intent in a structured format, and the host application decides how—and whether—to fulfill that intent. This separation is critical for enterprise adoption. It means you can enforce authentication, apply rate limits, log every interaction, validate inputs before execution, and implement approval workflows—all without modifying the model or its instructions. The model proposes; the infrastructure disposes.
Consider a customer support scenario. An agent receives a question about an order’s delivery status. Without tools, the model can only offer generic advice: “You can check your order status on our website.” With a get_order_status tool defined, the model can emit a structured request containing the order ID, the host application can query the fulfillment system, and the model can respond with the actual delivery date and tracking number. The interaction becomes useful rather than merely conversational.
The power of tool use scales with the number and quality of tools available. A financial analysis agent might have tools for retrieving market data, executing portfolio calculations, querying risk models, and generating formatted reports. A claims processing agent might access policy databases, fraud detection systems, medical code lookups, and payment authorization services. Each tool extends the model’s effective capabilities without requiring the model itself to contain that domain knowledge. The knowledge lives in the systems; the model provides the reasoning about when and how to use them.
Function Calling: The Contract Between Model and Application
Function calling is the specific implementation mechanism through which tool use operates. If tool use is the concept, function calling is the protocol. It defines the precise format in which models request actions and applications respond with results.
The term originates from OpenAI’s June 2023 API update, but the pattern has since been adopted—with variations—by Anthropic, Google, Mistral, and others. The core idea is the same across all of them: you describe available functions using JSON Schema, the model outputs a structured call when it wants to use one, and your code handles execution and returns results. The details differ in message format, multi-turn handling, and how parallel or sequential tool calls are expressed, but the architectural pattern is stable.
What makes function calling significant for enterprise architects is that it introduces a formal contract between the model and the application layer. The JSON Schema definition isn’t just documentation—it’s a machine-readable specification that both sides of the interaction agree to follow. The model commits to producing outputs that conform to the schema. The application commits to accepting inputs in that format and returning results the model can interpret. This is, in essence, an API contract, and it can be governed with the same disciplines you’d apply to any other API: versioning, backward compatibility, schema validation, and change management.
This contractual nature is what separates function calling from earlier approaches where models were simply prompted to “output JSON.” A prompted model might produce something that looks like JSON but contains syntax errors, missing fields, or invented parameter names. A model using native function calling is constrained by the schema definition—the output is structurally guaranteed to match the specification. This isn’t a minor improvement. For any system that needs to parse model output programmatically and route it to downstream processes, structural reliability is the difference between a prototype and a production system.
In practice, function calling also enables a pattern that matters deeply for agent design: multi-step reasoning with intermediate actions. A model can call a function, receive its result, reason about that result, and then call another function—chaining together a sequence of operations that would be impossible with a single prompt-response cycle. An insurance underwriting agent, for example, might first retrieve the applicant’s history, then query the risk model with that data, then check regulatory requirements for the applicant’s jurisdiction, and finally generate a recommendation. Each step informs the next, and the model’s reasoning ties them together into a coherent workflow.
Structured Output: Constraining the Response Space
Tool use and function calling address how models interact with external systems. Structured output addresses a related but distinct problem: how models produce responses that conform to a specific format, regardless of whether an external system is involved.
When a model generates free-form text, the output is unpredictable in shape. It might be a paragraph, a list, a table, or a mix of all three. For human consumption, this flexibility is a feature. For programmatic consumption—where downstream systems need to parse, validate, and act on the output—it’s a liability. Structured output solves this by constraining the model’s response to a defined schema, typically JSON Schema, ensuring that every response has the expected fields, types, and structure.
The applications in enterprise contexts are immediate and practical. A document classification system needs the model to return a category label, a confidence score, and a list of supporting evidence—not a free-text explanation that a human would need to interpret. A data extraction pipeline needs the model to return structured records from unstructured text—names, dates, amounts, reference numbers—in a format that can be loaded directly into a database. A compliance review system needs the model to return a pass/fail determination, a list of flagged clauses, and recommended remediation actions, all in a schema that the workflow engine can process without human intervention.
Structured output and function calling share mechanical similarities—both use JSON Schema to constrain model output—but they serve different architectural roles. Function calling is about the model requesting an action from the host application. Structured output is about the model delivering its own response in a predictable format. In agent architectures, you typically use both: function calling for the model’s interactions with external systems, and structured output for the model’s communications back to the orchestration layer or the user-facing application.
The reliability of structured output has improved substantially as providers have moved from prompt-based approaches (“please respond in JSON”) to grammar-constrained decoding, where the model’s token generation is mechanically restricted to produce valid JSON conforming to the supplied schema. This shifts structured output from “usually works” to “guaranteed to work,” which is the threshold enterprise systems require.
MCP: A Standards-Based Approach to Tool Integration
As tool use has matured, a problem has emerged: every agent framework, every model provider, and every enterprise defines tools differently. A Salesforce integration built for one agent framework has to be rebuilt for another. Tool descriptions written for OpenAI’s function calling format need to be translated for Anthropic’s format. The result is fragmentation, duplication, and unnecessary integration cost.
The Model Context Protocol (MCP), introduced by Anthropic and increasingly adopted across the industry, addresses this problem by standardizing how tools are described, discovered, and invoked. MCP defines a communication protocol between AI models and external systems—called MCP servers—using JSON-RPC 2.0 over multiple transport options (stdio, HTTP with Server-Sent Events, or WebSockets). An MCP server exposes tools, resources, and prompt templates in a standardized format that any MCP-compatible client can consume.
For enterprise architects, MCP’s value proposition is straightforward: build your integrations once, use them everywhere. An MCP server that provides access to your CRM system works with any agent built on any MCP-compatible framework. Tool definitions are portable. Capability descriptions are standardized. The protocol handles capability negotiation at connection time, so clients and servers can gracefully adapt to each other’s supported features.
MCP also introduces a clean separation between knowledge tools (read-only resources that provide information) and action tools (capabilities that modify external state). This distinction matters for governance. Knowledge tools—retrieving a customer’s order history, querying a knowledge base, looking up a product catalog—carry relatively low risk. Action tools—creating a support ticket, processing a refund, updating an account record—require authorization, audit logging, and potentially human approval. MCP’s explicit categorization makes it easier to apply appropriate controls to each type.
It is worth noting what MCP does not specify. Authentication, authorization, service discovery, rate limiting, and operational governance are all left to implementers. MCP standardizes the communication protocol, not the operational infrastructure around it. Organizations adopting MCP still need to build or integrate these capabilities—but they can do so once, at the infrastructure layer, rather than re-implementing them for every agent-to-tool connection.
From Capabilities to Primitives
Tool use, function calling, and structured output are the substrate on which agentic systems are built. Without them, a language model is confined to generating text. With them, it becomes a reasoning engine that can perceive, decide, and act within a defined operational envelope.
This is precisely what connects these foundational capabilities to the higher-level abstractions that matter for agent design. In the agentic primitives framework, tools are one of the six fundamental categories—the interfaces through which agents access external systems and take action. Tools divide into knowledge tools (read-only retrieval from databases, APIs, and knowledge bases) and action tools (operations that modify external state). Function calling is the mechanism that makes tools work. Structured output is the mechanism that makes an agent’s reasoning legible and actionable to the rest of the system. And MCP is the emerging standard that makes tools portable and interoperable across frameworks and providers.
Understanding these mechanisms at a technical level isn’t optional for anyone building enterprise agents. The decisions you make about tool design—what schemas you define, what validation you enforce, what governance you apply—directly determine whether your agents operate reliably or fail unpredictably. The abstractions above this layer (skills, tasks, orchestration patterns) assume that the tool layer works. When it doesn’t, nothing else matters.
Key Takeaways
The transition from chatbot to agent building block rests on three technical capabilities: tool use gives models the ability to request actions from external systems through a structured propose-and-execute pattern that keeps the application layer in control; function calling provides the formal, schema-driven contract between model and application that makes tool interactions reliable, composable, and governable; and structured output constrains model responses to predictable formats that downstream systems can parse and act on without human interpretation. MCP adds a standards layer on top, making tool integrations portable across frameworks and providers while cleanly separating knowledge tools from action tools for governance purposes. These are not features to evaluate in isolation—they are the foundational substrate that every other agentic pattern depends on, from the skills that encode domain expertise to the orchestration patterns that coordinate multi-agent workflows. Get the tool layer right, and you have a foundation for agents that scale. Get it wrong, and every abstraction you build above it inherits that fragility.