From LLM to Agent: The Architectural Leap

A large language model can write a convincing claims analysis, draft a customer support reply, or summarize a financial report. Give it a good prompt and the right context, and the output often looks indistinguishable from what a junior analyst might produce. Yet no serious enterprise architect would call that model an agent. The gap between a well-prompted LLM and a functioning agent is not a matter of degree. It is an architectural leap, and understanding exactly where the gap lies is the prerequisite for building systems that operate reliably in production.

Throughout this module we have examined the raw substrate that agents are built on: how tokens shape the cost and fidelity of every interaction, how context windows bound what a model can attend to in a single pass, how tool use extends a model’s reach beyond text generation, where reasoning hits hard limits, how memory systems compensate for the model’s inherent statelessness, and how cost economics constrain every design decision. Each of those topics describes a property of the language model itself. An agent is what emerges when you wrap engineering structure around those properties so that the model can pursue goals across time, observe the results of its actions, and adapt its behavior accordingly.

The Core Distinction: Completion vs. Action

A language model, at its heart, is a completion engine. It receives a sequence of tokens and predicts what comes next. Every capability we associate with LLMs, from translation to code generation to question answering, is a downstream consequence of that prediction mechanism. The model does not “want” to help you. It does not “decide” to look something up. It produces the most plausible continuation of the input it was given, conditioned on its training data and the instructions embedded in the prompt.

An agent, by contrast, is a system that acts. It receives a goal or a task, formulates a plan to achieve it, executes steps in that plan by invoking tools or producing outputs, observes the results, and revises its approach based on what it learns. The language model is the reasoning engine at the center of this loop, but the loop itself, the scaffolding of planning, execution, observation, and adaptation, is what makes the system an agent. Remove the loop and you are back to a single inference call, no matter how sophisticated the prompt.

This distinction matters enormously for enterprise deployments. A prompted LLM that drafts a customer support reply is a text generation service. An agent that receives a support ticket, retrieves the customer’s history from the CRM, checks the order management system for relevant transactions, drafts a response informed by those sources, determines whether the issue requires escalation, and routes accordingly is a fundamentally different architecture. The LLM is present in both cases. But in the second case it is embedded in an execution framework that gives it persistence, access, and the capacity to affect systems beyond the conversation window.

The Agent Loop: Observe, Plan, Act

The defining structural pattern of an agent is the agent loop, sometimes described as an observe-plan-act cycle. This loop is what transforms a stateless inference call into a process that can pursue objectives across multiple steps, handle unexpected outcomes, and converge on a result even when the initial plan proves inadequate.

Observation is the phase in which the agent gathers information about its current state and environment. This might mean reading the latest message from a user, checking the output of a tool it invoked in a previous step, querying a knowledge base for relevant context, or inspecting the result of an API call. Observation is where the context window constraints we examined earlier become operationally significant. The agent must decide what information to attend to, because it cannot attend to everything. Retrieval strategies, summarization, and memory systems all serve this phase by helping the agent construct a working context that fits within the model’s attention limits.

Planning is the phase in which the agent determines what to do next. In the simplest agents, planning is implicit: the model’s next-token prediction, guided by a system prompt, effectively selects an action. In more sophisticated systems, planning is explicit: the agent generates a structured plan, decomposes a complex task into subtasks, or evaluates multiple candidate approaches before committing to one. The reasoning capabilities and limitations we discussed earlier directly govern the quality of this phase. An agent cannot plan beyond the reasoning depth of its underlying model, which is why task decomposition, structured prompting, and workflow constraints exist, to keep the planning challenge within the model’s reliable operating range.

Action is the phase in which the agent executes its plan. This is where tool use becomes essential. An agent that can only produce text is limited to advisory roles. An agent that can call APIs, execute code, query databases, send notifications, and update records can participate in real business processes. The tool integration patterns covered earlier in this module, the mechanics of function calling, schema definition, and result parsing, are the infrastructure that makes this phase possible. Every action produces an outcome, and that outcome feeds back into the observation phase, closing the loop.

The loop continues until the agent determines that its goal has been met, that it cannot make further progress, or that it should escalate to a human. This termination logic is itself a critical design decision. An agent that loops indefinitely burns tokens and money. An agent that terminates too early leaves work unfinished. Getting this balance right requires explicit stopping conditions, iteration limits, and confidence thresholds, all of which draw on the cost and reasoning considerations we have discussed.

Orchestration: Managing the Loop at Scale

The agent loop describes what happens inside a single agent. Orchestration describes what happens when that agent operates within a larger system, one that may include other agents, human reviewers, external services, and enterprise workflows.

Orchestration is the layer of engineering that determines how an agent is invoked, what resources it can access, how its outputs are validated, and how its work integrates with the broader process. In a claims processing pipeline, for example, orchestration might mean that a triage agent classifies incoming claims, routes complex cases to a specialist agent, ensures that any payout above a threshold receives human approval, and logs every decision for audit. The individual agents each run their own observe-plan-act loops, but orchestration coordinates those loops into a coherent end-to-end process.

Three orchestration patterns appear repeatedly in enterprise agent systems. Workflow orchestration uses deterministic, predefined sequences: step one always leads to step two, branching only at explicitly defined decision points. This pattern suits regulated processes where auditability and reproducibility are paramount. Agentic orchestration places an agent in the coordinator role, allowing it to dynamically plan which agents to invoke, in what order, and with what inputs, based on the specifics of each case. This pattern suits complex, variable tasks where the optimal path cannot be specified in advance. Choreography removes the central coordinator entirely, relying on event-driven communication between independent agents that react to each other’s outputs. This pattern suits high-scale, loosely coupled systems where resilience matters more than tight coordination.

The choice between these patterns is one of the most consequential architectural decisions in agent system design. It determines your system’s debuggability, its failure modes, its cost profile, and the degree of autonomy you are granting. These are not theoretical concerns. They are the engineering constraints that separate a demo from a production system.

The Four Ingredients of Agency

Stripped to its essentials, the architectural leap from LLM to agent requires four capabilities that a bare language model does not possess on its own.

Planning is the ability to decompose a goal into a sequence of steps and to revise that sequence as new information arrives. Without planning, the system is reactive: it responds to inputs but does not pursue objectives. Planning is what gives an agent directionality. It is also the capability most constrained by the reasoning limits of the underlying model. Agents that attempt plans beyond their model’s reliable reasoning depth produce confident-sounding but structurally flawed strategies, a failure mode that has sunk numerous enterprise pilots.

Tool access is the ability to interact with systems beyond the model’s own parameters. This includes reading from databases, calling APIs, executing code, and writing to external stores. Without tools, an agent is confined to generating text about what it would do rather than doing it. Tool access is also where security, governance, and cost management converge. Every tool an agent can invoke is an attack surface, a cost center, and a governance obligation. The principle of least privilege, granting only the tools necessary for the task at hand, is not optional in enterprise contexts.

Memory is the ability to retain and retrieve information across interactions and beyond the limits of the context window. The context window, as we discussed, is the model’s working memory: large but finite, expensive to fill, and wiped clean between sessions. Agents need something more persistent. Short-term memory systems track the state of a task in progress: what has been tried, what succeeded, what failed. Long-term memory systems store knowledge that accumulates over time: customer preferences, resolved incidents, learned procedures. Without memory, every agent invocation starts from zero, repeating work, losing context, and failing to improve.

Autonomy is the degree to which the agent can act without human intervention. Autonomy is not binary, it is a spectrum that enterprise architects must calibrate carefully for each deployment. An agent that requires human approval for every action is a glorified autocomplete. An agent with unconstrained autonomy is a liability. The practical challenge is determining the right autonomy profile for each use case: which decisions the agent can make independently, which require human review, and which should be escalated unconditionally. This calibration depends on the reversibility of actions, the cost of errors, the maturity of the system, and the regulatory environment in which it operates.

These four ingredients, planning, tools, memory, and autonomy, are individually necessary and collectively sufficient to transform a language model into an agent. Remove any one and the system degrades: without planning it is reactive, without tools it is theoretical, without memory it is amnesiac, without autonomy it is inert.

From LLM to Agent

The agent loop, the four ingredients of agency, and the three orchestration patterns that coordinate agents into enterprise systems.

Why the Substrate Matters

The preceding posts in this module were not prerequisites in a bureaucratic sense. They are the engineering foundation without which agent architecture becomes guesswork.

Understanding tokens explains why agent interactions cost what they cost, and why a poorly designed agent loop can burn through a budget in hours. Understanding context windows explains why agents need retrieval and memory systems: the model literally cannot hold an entire customer history, policy document, and conversation transcript simultaneously without strategic management of what enters the window. Understanding tool use explains the mechanics by which an agent translates intent into action, and why the interface design of those tools directly affects agent reliability. Understanding reasoning limits explains why agents fail on certain classes of problems, why task decomposition is not optional for complex workflows, and why confidence calibration matters. Understanding memory explains how agents maintain continuity across sessions and why stateless architectures produce inconsistent, frustrating user experiences. Understanding cost explains why every architectural decision, from model selection to orchestration pattern to tool granularity, has a direct financial consequence that compounds at enterprise scale.

An architect who skips this substrate and jumps directly to building agents will produce systems that work in demos and fail in production. The failure modes will be mysterious until you trace them back to a misunderstanding of the underlying mechanics: a context window silently exceeded, a reasoning chain that exceeded the model’s reliable depth, a memory system that lost critical state, a tool invocation pattern that generated ten times the expected cost.

The Conceptual Vocabulary Going Forward

This module has introduced a set of terms and concepts that will recur throughout the rest of the curriculum. The agent loop is the observe-plan-act cycle that gives agents their capacity to pursue goals across multiple steps. Orchestration is the engineering layer that coordinates agents, tools, and humans into coherent processes. Planning is the agent’s ability to decompose objectives into executable steps. Tool access is the interface layer through which agents affect external systems. Memory is the mechanism by which agents maintain state beyond the context window. Autonomy is the configurable degree of independence an agent exercises.

These concepts are not abstractions. They are the design parameters you will manipulate in every agent system you build. The next module will formalize them by mapping them onto a concrete framework: the 17 agentic primitives, which are the canonical building blocks of agent architecture, organized into six categories covering actors, tools, instructions, coordination, connections, and interactions. Alongside the primitives, Module 1 introduces the AI Agent Canvas, a structured design tool for translating business requirements into agent architectures. Where this module gave you the physics, Module 1 gives you the engineering discipline.

Key Takeaways

The leap from LLM to agent is not about better prompting, larger context windows, or more capable models. It is an architectural transformation that wraps a language model in a persistent loop of observation, planning, and action, supported by tool integrations that give the model reach into real systems, memory mechanisms that give it continuity across interactions, and calibrated autonomy that determines where human judgment remains in the loop. Every property of the underlying model, its tokenization scheme, its context window limits, its reasoning depth, its cost per inference, shapes the agent built on top of it, which is why understanding the substrate is not optional. With that foundation in place, you are ready for Module 1, where the 17 agentic primitives and the AI Agent Canvas provide the structured vocabulary and design methodology for building agent systems that hold up in production.

The Core Distinction: Completion vs. Action

The Agent Loop: Observe, Plan, Act

Orchestration: Managing the Loop at Scale

The Four Ingredients of Agency

Why the Substrate Matters

The Conceptual Vocabulary Going Forward

Key Takeaways

Continue reading

How LLMs Process Information: Tokens, Context Windows, and Why They Matter

Memory, State, and Learning: What LLMs Remember (and Don't)

What Is a Large Language Model?