Designing Governable Agents 

Jake Mannix, Walmart U.S. Tech
Abstract digital illustration of glowing blue wireframe cubes connected by flowing lines of data, with light trails converging into a central cube on a dark background.

This is a productive moment for software development.  


In minutes, teams can assemble systems that listen, speak, reason, and act. Models, tools, and APIs can be composed quickly, often by a single developer. The pace of experimentation is lightening fast, and the results are often impressive. 


It is also an unmistakably turbulent moment.  


Frameworks are evolving quickly, best practices are still forming and teams are learning what works by building, shipping and revising in rapid succession. That turbulence is not a failure; it is a signal. What works at demo scale quietly breaks at organizational scale, and most teams will not encounter that problem until it is already creating friction. 


When Autonomy Becomes a Systems Problem 


At small scale, agentic systems feel manageable. At organizational scale, they become something else entirely.  


When thousands of developers can build agents, those agents inevitably interact across teams, whether intended or not. At that point, “single-agent systems” stop being a meaningful concept. A single-agent approach is no longer viable for quality. Everything becomes multi-agent by default, and local decisions accumulate into global effects. 


As that shift happens, the constraints change. Without strong protocols, agents do not simply become flexible – they become uncontrollable. 


Autonomy creates leverage only when it is paired with constraints that are explicit, inspectable and enforceable. If an organization cannot trace how a decision was made, which agents were involved, which tools were called and what data flowed through the system, it cannot audit behavior, investigate incidents or establish trust. At that point, autonomy becomes operational debt. 


Programming in Language, Not Interfaces 


Building agents today means working in a fundamentally new programming model. Developers write prompts and shape context; they enumerate tools. Models are asked to select actions from a predefined set and execute them against the world. Those actions may hit datasets, call APIs or invoke other agents that repeat the process. 


In effect, we are programming with language plus protocols. 


In traditional systems, contracts were explicit. Remote Procedure Call (RPC) contracts were typed. gRPC frameworks made assumptions visible. Engineers could reason about what a system was allowed to do because the constraints were part of the interface. 


Natural language hides invariants, and agent systems eliminate much of the structure that once made behavior predictable. Everything becomes strings and probabilities. An engineer can review a prompt and still be unable to predict what the system will do. At scale, that ambiguity undermines security, compliance and reliability. This isn’t a flaw in language models, it’s a consequence of removing structure without replacing it. 


Once language becomes the control plane, structure must move up the stack. 


Trust Has To Be Carried, Not Assumed 


If agents are going to interact safely, constraints have to travel with them.  


Clear rules are needed to define what each agent can do, what data it can touch and who is responsible. Those rules cannot only live in documentation and institutional knowledge. They must be machine-readable, enforceable and auditable. 


That means carrying schemas, labels, data classification, provenance and capabilities with every agent and tool interaction, and linking them through a centralized registry so the organization can reason about them. Teams can then trust an agent’s declared profile rather than inferring behavior from a prompt. 


Where MCP Helps and Where it Binds Too Tightly 


Model Context Protocol (MCP) represents a meaningful step forward. It standardizes how agents discover and invoke tools and introduce schemas and metadata. 


MCP also operates very close to execution, where semantics are implied rather than abstracted. Tool descriptions and examples are injected directly into an agent’s context. When teams build directly against MCP tools, they bind themselves not just to the protocol, but to its current assumptions about context, control, and orchestration. In a fast-evolving ecosystem, that tight coupling limits adaptability. 


Separating Meaning From Execution 


That tension points to the need for a stabilizing layer above MCP servers: a virtual agentic interface that defines meaning independently of implementation. This layer does not replace MCP; it sits above it. 


At this level, agents are described in terms of: 

  • declared capabilities and skills 
  • typed inputs and outputs 
  • data classifications and constraints 
  • approval, retry, and compensation semantics 
  • identity, authority and purpose 

The virtual layer defines what an agent is allowed to do and what it promises, without prescribing how those promises are fulfilled. MCP becomes one execution substrate among others.  


This separation allows teams to swap MCP implementations, orchestration frameworks or execution engines without redefining agent intent. 


From Declared Intent to Enforced Behavior 


Once tools and agents carry rich metadata, agentic linters become possible.  


With declared inputs, outputs, data labels, and capabilities, linters can analyze systems before they run according to policy. They can flag tools that accept sensitive data and post externally, agents that claim human-in-the-loop behavior without enforcing it, missing approvals for destructive actions or unsafe combinations of user-generated content and privileged tools. At scale, mechanical enforcement is the only way to balance autonomy with safety without centralizing control. 


Agent cards become contracts, not documentation. Registries become sources of truth. Linters can reason about combinations of capabilities and data flows, allowing risk to be detected at build time rather than after incidents occur. 


As agent interactions grow more complex and more distributed, governance can no longer rely on shared context or informal review. 


Designing For Movement Along The Spectrum 


Not all actions require the same level of determinism. 


Some benefit from flexible, probabilistic reasoning. Others demand guarantees such as idempotency, retries, human approval or explicit consent. A virtual layer allows teams to declare where each capability belongs on that spectrum. 


Execution engines — whether MCP-backed tools, workflow systems, or future orchestration models— can then enforce those guarantees consistently. The result is more intentional autonomy. 


As agent frameworks evolve, MCP matures, and new orchestration models emerge, organizations should not be forced to re-litigate what their agents mean. They should be able to evolve how agents run without rewriting what agents are. This is how early turbulence turns into durable systems. 


Stability Without Stagnation 


Software development feels unsettled right now because it is still forming. That is a condition to design around, not a problem to eliminate. 


Agentic systems will succeed in large organizations as teams move beyond early framework choices and separate stable interfaces from unstable implementations. A virtual layer above MCP servers makes that separation possible. 


Agentic interfaces can mature even as everything underneath them continues to change.

Explore More Stories

We received a GPC signal or a request to opt-out of selling/sharing your personal information (including for targeted advertising); this notification is to indicate that your opt-out request is being honored. You can find out more about your consumer privacy rights in our Privacy Notice.

#f2f2f2