What Makes an Agent Work Is Not the Tools, but the Loop

When people talk about agents, the conversation tends to drift very quickly toward MCP, skills, and plugins. Those things matter, of course. But if you really want to understand how an agent works, the most important thing to look at is usually not how many capabilities it has connected. It is whether it has a real loop.

That is the real difference between an agent and an ordinary chat model. The gap is not simply “more tools” or “longer answers.” A chat model usually works like this: you ask, it answers, and that round is over. An agent does not treat one output as the end of the job. One round is just a step. If the task is not done, it goes again.

That is the loop.

In plain language, it looks something like this: look at the goal, look at what is currently known, let the LLM decide the next step, call a tool if needed, take the result back, update the state, and then decide whether another round is necessary.

It does not sound mystical. But most of what makes agents interesting lives right there.

The loop is not just “calling the model again”

The first time people hear the word loop, they often picture repeated LLM calls. That is not entirely wrong, but it only captures the surface.

What makes a loop useful is not that the model gets to speak several times. It is that the system stays connected to itself from one round to the next:

what happened in the last round is still visible in the next one
what a tool returned is still available after the tool call ends
where the task currently stands is not forgotten halfway through

Without that continuity, it is not really a loop. It is just a conversation refreshing itself.

If you strip an agent loop down to its smallest working form, it usually includes the same basic pieces:

a goal
a state
some memory
a list of available capabilities
a set of rules
one LLM step
an executor
and a stopping condition

In other words, the model is one node inside the system, not the whole system.

Written as pseudocode, it looks roughly like this:

while (!done) {
  context = buildContext(goal, state, memory, availableTools)
  decision = llm(context)
  action = parse(decision)

  if (action.type === "ask_user") {
    waitForUser()
  } else if (action.type === "call_tool") {
    result = executor(action)
    state = updateState(state, result)
    memory = updateMemory(memory, result)
  } else if (action.type === "respond") {
    output(action)
    if (taskFinished(action, state)) done = true
  } else if (action.type === "replan") {
    state = rewritePlan(state)
  }

  if (tooManySteps || highRisk || repeatedFailure) {
    escalateOrStop()
  }
}

The code is not the point. The point is what it reveals: an agent is not just a model out in the wild. It is a model placed inside a system that keeps running.

So what exactly is the LLM doing inside the loop?

People often say “the loop controls the LLM.” That is not totally wrong, but it can sound more abstract than it needs to.

What usually happens in practice is simpler: the runtime sets the boundaries for this round, and then the model makes a decision inside those boundaries.

That means the system first has to decide what this round is for. Is the model supposed to make a plan? Pick the next action? Fill in tool arguments? Summarize a result? Not every round is a “final answer” round. In many systems, each round is intentionally narrow. That helps keep the whole thing stable.

Then the system decides what the model gets to see. It usually does not dump the entire history, every document, and every tool description into every prompt. It selects what is relevant right now: the current goal, the current phase, the last few important steps, the relevant memory, the tools available in this round, and the expected output format. People often call this context building. In ordinary language, it just means giving the model the right cards for this turn, not the whole deck.

Then comes output. More mature agent systems do not usually let the LLM spill out a blob of natural language and leave the rest of the software guessing whether it wanted to call a tool. A more stable pattern is structured output. The model says explicitly what it wants to do, or admits it is missing information and needs to ask the user something first. That makes the next step much easier for the rest of the system to handle.

And then there is one more step people often skip when explaining this stuff: even if the model says “I want to call this tool,” the system may still say no.

The runtime usually checks again. Is the tool available? Are the arguments valid? Does this action make sense at the current stage? Is it within budget and permission scope? Is it risky enough that a human should confirm it first? Only after those checks does anything actually execute.

So from start to finish, the LLM is better understood as a decision-making node inside the loop. It is not the entire agent, and it does not get to do whatever it wants.

How capabilities actually get used

Once you follow that logic, “capability orchestration” becomes much less mysterious. In practice it usually comes down to three questions:

Should a capability be called at all?
Which one should be called?
How does the result get fed back into the next round?

The first step is usually a capability registry. The system needs to know what it has: tool names, descriptions, parameter shapes, permission levels, cost or rate-limit information, and the kinds of situations each one is meant for. MCP, native tools, internal APIs, and plugins all tend to get flattened into “callable capabilities” at this stage. From the perspective of the loop, MCP is more of a connection method than the loop itself.

The second step is action selection. This is often where the LLM is most directly involved. The runtime gives the model the goal, the current state, and the tool list. The model proposes a next step. Then the system checks whether that proposal is sensible. In other words, robust agents are usually built around “the model proposes, the system validates,” not “the model decides everything.”

The third step is actual execution. Once a tool is selected, the call usually goes through an executor. The executor handles argument translation, API calls, timeouts, retries, rate limits, error capture, and result normalization. The outside world is never as tidy as the model would like it to be, so this layer matters a lot.

The final step is feeding the result back into the loop. A tool call is not useful if the result does not re-enter the system. Usually the runtime updates task state, updates working memory, records logs, and passes the most relevant result back into the next LLM step. Without that, one round cannot really connect to the next.

Why tasks fall apart halfway through

One reason some agents feel promising at first and then start drifting is that the task itself was too loose from the beginning.

The usual problems are familiar: the goal is large but has no stages, there is no clear state for progress, there is no shared definition of completion, and there is no agreement on what counts as failure. Once that happens, every new round starts to feel like a new beginning. Of course the loop gets messy.

A task that can actually survive a loop usually needs at least four things spelled out.

The first is the task object itself: what is this task, what constraints matter, what state is it in, how urgent is it, and is there a deadline or budget?

The second is intermediate state. For complex tasks, “not done” and “done” are usually not enough. It helps a lot to have middle states such as gathering context, planning, executing, waiting for tool results, waiting for user input, verifying, completed, or failed. The clearer those states are, the less likely the loop is to get lost.

The third is success criteria. This is one of the most commonly skipped pieces. Without it, the loop tends to fail in one of two ways: it either keeps going after the job is basically complete, or it declares success too early. The system needs to know what deliverable counts as done, what checks must pass, and which situations require human review before the task can close.

The fourth is failure conditions. What happens if a tool call fails three times in a row? What if crucial input never arrives? What if the task exceeds budget? What if quality checks keep failing? This is not about pessimism. It is just how you stop a loop from spinning uselessly.

Memory is less mystical than it sounds

When people hear “agent memory,” they often jump straight to vector databases. Vector retrieval matters, but if memory is reduced to “store things in a database,” you miss what memory is doing during actual runtime.

For a loop, memory is not mainly about storing more. It is about making sure the next round gets the right amount of information.

In practical systems, memory often falls into three layers.

The first is working memory. This is the layer closest to the current task. It usually contains the current goal, the current phase, the last few steps, key tool results, pending subtasks, and any new constraints the user just added. It does not try to be complete. It tries to be immediately useful.

The second is episodic memory, meaning what has already happened in this task. Which paths were tried before, which tools failed, which options the user rejected, where the task last stopped. This is not mainly about building knowledge. It is about not making the same mistake over and over.

The third is long-term memory. This is where you put more durable things: user preferences, project history, team-specific terminology, business rules, and similar material. It usually does not get shoved into every round. It gets retrieved when needed.

The mistakes here are fairly ordinary. Some systems store everything and end up with dirty context. Some compress nothing and watch cost rise while clarity falls. Some fail to separate long-lived facts from short-lived state, so temporary things linger too long and durable things disappear too early.

That is why the real design questions are usually not “does the agent have memory?” but things like: which layer does this information belong to, how long should it remain, does the next round need the raw source or just a summary, and when should this memory be dropped?

What the runtime usually looks like

If you stop treating an agent as a chat window and start treating it as a system, the layers become easier to see.

There is usually a goal layer that defines what the task is trying to achieve. A state layer keeps track of where the task stands. A planner breaks the goal into stages or subtasks. An LLM decision layer lets the model produce the next step given the current context. A capability layer connects tools, APIs, plugins, and MCP services. An executor actually performs outside actions and handles errors. A memory layer stores, compresses, retrieves, and evicts information. A guardrail or evaluation layer handles checks, risk control, retries, halts, rollbacks, and human confirmation.

The code does not always label them that neatly. But most agents that work for more than a demo tend to contain something close to that stack.

A simple example: writing an article

Take a fairly ordinary task: write an article for general readers explaining how an agent loop controls the LLM, orchestrates capabilities, and manages memory and tasks.

A more believable agent does not usually jump straight into drafting the full article. It will first establish task state: the goal, the audience, the constraints, and whether any key input is missing. If something is missing, it asks.

Then it plans the structure. This is not the moment to write paragraphs yet. It is the moment to figure out which questions must be answered, what order they should appear in, where examples help, and where pseudocode might help. After that, the system can check whether the structure actually covers the task.

Only then does it gather more context. If the part about controlling the LLM is still weak, it may retrieve notes, read local docs, or inspect code comments. The result comes back into working memory.

Then comes drafting, usually in pieces rather than one long spill: first explain what the loop is, then what role the LLM plays, then capability orchestration, then task design, then memory design. Each part can be checked as it lands.

And finally there is a validation pass. Did the article drift? Is the control relationship clear? Are task design, memory, and orchestration actually explained? Is any section still talking in vague abstractions? If something is off, the system revisits that section instead of throwing everything away.

At that point you can see the important part pretty clearly: the thing that keeps turning is not “the model by itself.” It is the runtime loop. The LLM is one part of it.

Where `skill`, `MCP`, and `plugin` fit

Since this piece is not really about them, they only need a short version here.

Skill is best thought of as a way of doing a class of tasks. It changes how the loop behaves, but it is not the loop itself.

MCP is better understood as a standardized connection method. It solves “how capabilities get connected,” not “how the task keeps moving.”

Plugin is closer to an extra capability module. It gives the system more things it can do, but it does not decide how the loop should turn.

So if you really want to compress them into one sentence, it might be this: the loop is the circuit, the LLM is the decision node, skill is method, MCP is connection, and plugin is added capability.

Why some “agents” still feel like chatbots

In the end, the distinction is not especially mystical.

If a system mainly does this — give the model a list of tools, let it pick one, then keep talking after the result comes back — it may already be more capable than plain chat, but it still often feels closer to a tool-augmented dialogue system than a full agent.

A more complete agent usually adds several things on top: explicit task state, clear success and failure conditions, structured action protocols, controlled capability calls, layered memory, and the machinery for checking, retrying, rolling back, halting, or asking for human confirmation. Most of all, it keeps moving across multiple rounds instead of treating one answer as the whole job.

So in the end, the question is still the same: does it have a stable loop?

In the end, it really does come back to the loop

Once you strip away the buzzwords, loop is not an especially exotic idea. It is just the system reading the goal, building context, letting the LLM make a decision, turning that decision into action, taking the result back, updating state, and starting the next round.

Without that loop, even a powerful model is still mostly a model that talks well. Once the loop exists — and once task design, memory, orchestration, execution, and checking are actually wired into it — the system starts to feel like something that can do work.

So the next time someone explains agents with metaphors about brains and limbs, that is fine as a rough picture. But the real question is usually simpler than the metaphor: is there a loop, and does it actually hold together?