Challenges

Verifiable Governance for Agentic AI: From Advisory Principles to Runtime Watchdogs

By Marc Molas·March 23, 2026·11 min read

The governance gap in agentic AI is structural, not philosophical. Most AI governance — principles, codes of ethics, model cards, advisory frameworks — describes how AI should behave. None of it stops the AI from doing something else when nobody is watching. For predictive models with no real-world side effects, that gap is tolerable. For agents that act through tool calls — sending emails, executing trades, modifying production data, spending money — it isn't.

The recent paper Verifiable Governance Architecture (VGA) for Organisations and Teams with Human and AI Employees (Fradelos, January 2026) names this gap directly: "many governance principles are advisory, while modern agents act through tool calls with real-world consequences." It then proposes an engineering pattern for closing it: a runtime Watchdog that mediates tool calls with fail-close (default-deny) semantics, governance encoded as policy-as-code (OPA/Rego), and an immutable evidence store that prevents the AI from hallucinating its own compliance.

This is the design pattern the field has needed for a while. It's worth understanding in detail because the choices are non-obvious and the failure modes of weaker alternatives are real.

The Core Insight: Action Boundaries, Not Average Behavior

Three governance approaches dominate current practice:

Prompt guardrails: add safety instructions to the system prompt.
Reward-model oversight: train models to refuse certain actions.
Process supervision: insert human reviewers at decision points.

All three improve average behavior. None of them, by themselves, provide action-boundary guarantees for irreversible tools.

This is the insight that makes the rest of the pattern follow. An agent that has been trained to "not exfiltrate customer data" will not exfiltrate customer data on average. It may exfiltrate customer data in adversarial conditions, in unusual prompt distributions, in tool-call sequences that nobody anticipated, or simply because the training distribution didn't cover the specific scenario. Average improvements aren't safety guarantees for irreversible actions.

The VGA pattern starts from the opposite stance: don't try to make the agent reliably good. Make the actions the agent can take constrained by something the agent cannot bypass.

The Watchdog

The Watchdog is the runtime layer that mediates every tool call before it reaches the tool. Every action the agent wants to take goes through it. The Watchdog has three properties that distinguish it from looser alternatives:

Fail-close (default-deny)

If the Watchdog cannot positively verify that an action is permitted, the action is denied. This is the opposite of most production guardrail patterns, which are fail-open by default — if the rule doesn't match, the action proceeds.

Fail-close is non-negotiable for agentic AI specifically because the failure mode of fail-open is "the agent did something nobody authorized when the policy didn't anticipate the case." Fail-close means the failure mode is "the agent stopped and asked," which is recoverable.

Mediates the tool surface, not the model surface

The Watchdog doesn't sit between the user and the model. It sits between the model and the tools. This matters because the model's reasoning, prompt content, and intermediate outputs are all in scope for content moderation, but they're not the action boundary. The action boundary is the tool call. Putting the Watchdog at the tool surface means the policy enforced is the policy that actually matters.

Multi-party authorization for sensitive actions

Some actions require more than one approver. The Watchdog can express this naturally: a tool call to "transfer >$10k" requires a verification packet signed by both the agent and a human approver before the call proceeds. A tool call to "modify production database schema" requires a CI/CD pipeline that has run the migration safety checks. The architecture is the same; the policy varies by sensitivity.

The MVV Matrix: Binding Laws to Verifiable Artifacts

The second core idea is the Minimal Viable Verification (MVV) matrix. It binds each governance rule to a specific, verifiable artifact and a specific cadence at which that artifact must be produced.

Without MVV, governance rules become aspirational. "The system must be auditable" sounds good and verifies nothing. With MVV, "the system must be auditable" gets translated into specific artifacts:

An immutable log of every tool call with policy decision attached.
A signed evidence packet binding intent → tool call → outcome.
A weekly attestation report from a designated auditor.
A quarterly external review with red-team scenarios.

Each artifact has a defined producer, format, and cadence. The verification of compliance reduces to "did the artifact get produced on time, in the right format, with valid signatures?" — which is something a CI/CD pipeline can check.

This is what "governance-as-code" actually means in practice. It's not just policies in Rego. It's the verifiable evidence of policy enforcement, produced by the runtime, on a schedule, in formats a regulator could audit.

Why the Evidence Store Has to Be Immutable

The most subtle design choice is the immutable evidence store. The Watchdog produces a record of every decision; that record goes into a store the agent itself cannot modify or backdate.

The reason is what the paper calls governance hallucinations: the failure mode where the agent produces a plausible-looking compliance report that doesn't match what actually happened. If the evidence store is mutable by anything in the agent's reach, the agent (deliberately or as an emergent behavior) can produce compliance evidence that doesn't reflect the actions taken.

The fix is structural: compliance depends on the evidence store, the evidence store is mutable only by the Watchdog, the Watchdog is outside the agent's tool surface. Even if the agent produces a perfect-looking compliance summary, the audit can compare it to the immutable store and detect the divergence.

This is the same principle as separating database write authority from application code in any serious production system. The novelty is applying it to AI governance evidence.

Seniority-Dependent Decision Rights

The fourth idea is operationally important: agents have seniority. A "junior" agent has narrow tool access and requires multi-party authorization for most non-trivial actions. A "professional" agent has broader access. A "senior" agent can authorize narrower-scope actions on behalf of others.

This sounds like enterprise access control because it is. The point is to apply it to AI agents specifically, with the same rigor and same auditability as human role-based access control. In practice this means:

New agents start as junior with constrained tool access. They earn (or are configured into) broader scope only after passing specific verification.
Tool access is the boundary, not "the model's training" or "the system prompt." Two agents using the same model can have very different decision rights based on their access policies.
Promotions are explicit and audited. When an agent moves from professional to senior scope, the change is recorded, the evidence is retained, the rollback is straightforward.

This is the part most production agentic systems in 2026 still get wrong. They have one agent role with all the tools, and the boundary is a system prompt. The seniority pattern is a more honest representation of what's actually needed.

Mapping to Real Compliance Regimes

The pattern is explicitly designed to map onto EU AI Act record-keeping and robustness obligations. The evidence store satisfies record-keeping. The fail-close Watchdog satisfies robustness. The MVV matrix satisfies the auditability requirements. Multi-party authorization satisfies the human-oversight requirements for high-risk systems.

This isn't accidental. The architecture is designed so that compliance becomes a property of the artifacts produced, not a question of "did the agent behave well." This is the only durable way to comply with regulations that require evidence rather than trust.

What This Means If You're Building Agentic Systems Now

Practical actions for any team shipping agentic AI in 2026:

Move policy enforcement to the tool surface. If your guardrails live in the system prompt, you have advisory governance. Put a fail-close mediator between the model and the tools.
Adopt policy-as-code. OPA/Rego is the most mature choice; the specific tool matters less than the discipline. Policies in code can be reviewed, versioned, tested in CI, and audited. Policies in prompts cannot.
Build the evidence store before you scale. An immutable, signed log of agent actions is much harder to retrofit than to design in. Even if you don't yet need the audit, the operational debugging value alone is enormous.
Apply seniority to agents. New agents get narrow scope. Scope expansion is explicit, audited, and reversible. Don't run all your agents at the same authorization level.
Run multi-party authorization on irreversible actions. Anything financial, anything that touches customer data, anything that modifies production. The performance cost of multi-party authorization is much smaller than the cost of one bad action.

What VGA Doesn't Do

Two honest limits worth naming.

It doesn't make the model better. VGA bounds what the agent can do; it doesn't change how well the agent reasons within those bounds. Improving model behavior is still important — but it's now an optimization problem inside known safety bounds, not the safety mechanism itself.

It costs latency. Every tool call goes through policy evaluation. With well-tuned OPA bundles this is milliseconds, but it's not zero. For latency-sensitive paths, you'll need to engineer carefully — typically with cached decisions for hot paths and per-request evaluation for sensitive ones.

The cost is real. The cost of not having it is much higher, and it shows up as headlines.

The shift from advisory to verifiable governance for agentic AI is happening; the only question is whether your organization is ahead of or behind the curve. The architecture pattern is here. Adopting it isn't a research project anymore.

Source: Fradelos, G. Verifiable Governance Architecture (VGA) for Organisations and Teams with Human and AI Employees (Geneva, January 9, 2026). SSRN 6306840.

Building agentic systems and need engineering capacity that already builds with policy-as-code, fail-close watchdogs, and immutable evidence stores? Talk to a CTO about deploying a nearshore squad with the right discipline for verifiable AI governance.