Home / Field Notes / AI Agent Governance Guide

Reference Guide

AI Agent Governance: A Field Guide to Control, Proof, and Safe Adoption

The governance problem becomes real when a team moves from "people are trying AI tools" to "agents and workflows can touch repos, CI/CD, MCP tools, credentials, and release paths." From that point, the useful question is whether the organization can approve, bound, observe, and reconstruct the work after it changes state.

CAISI treats the mature version as AI Software Delivery Control, but most teams should start with the team-language question in front of them: what can this workflow touch, what needs approval, and what proof remains?

In this guide

What it means Start with the team language The 10-minute accountability test The core control layers Where teams go wrong Start by role Where to go next

All field notes | Open the glossary

Start with the team language

Do not force the organization to learn a category name before it has named the operating problem. Start with the sentence people are already saying:

We are rolling out coding agents and security is nervous.
We need audit evidence for AI-assisted SDLC.
We do not know what agents, MCP tools, or CI jobs can reach.
We need to approve risky actions, not every prompt.
We do not want long-lived credentials sitting inside agent workflows.

Then translate that sentence into an action path: actor, owner, repo, workflow, credential, action, target, approval rule, and proof. That translation is where governance becomes operational.

What this is

A practical reference page

This guide pulls together the main ideas that recur across CAISI: control, evidence, repo contracts, isolated execution, and proof of work for AI-generated change.

What this is not

Not generic AI governance

This page stays focused on software-delivery systems, coding agents, MCP and tool use, CI workflows, and approval and proof at the execution boundary.

Who this is for

AppSec, CISO, platform, and engineering leaders

Use this page when you need one shared language for control quality, evidence quality, and safe adoption decisions.

What AI agent governance means

AI agent governance begins the moment a system can change something real. Before that, you are mostly dealing with assistance. After that, you are dealing with permissioned autonomy across deterministic systems.

The practical question is simple: if the system took an action that mattered, could your team explain what it knew, what it tried to do, what was allowed, what executed, and what proof remains? If the answer is no, the organization does not have a serious governance layer yet.

The 10-minute accountability test

A useful way to judge maturity is the 10-minute accountability test. During an incident or audit, can your team answer these questions quickly without relying on screenshots or memory?

What did the system know when it decided?
Which policy applied and what verdict did it return?
Was the action allowed before execution?
What actually executed or stayed non-executable?
What proof exists and would it survive outside the product UI?

If those answers take hours to rebuild, the control surface is weaker than it looks in a demo.

The core control layers

Discovery

Start with the Agent Action BOM

Inventory unknown-to-security tools, MCP configs, repo entrypoints, CI workflows, credentials, reachable actions, target systems, owners, approval rules, and proof coverage.

Boundary enforcement

Put the gate before the action

Controls matter when they can change runtime behavior before the tool call crosses the execution boundary.

Deterministic workflow

Let code own deterministic steps

Planning can stay flexible. Validation, shipping, and merge mechanics should be in code, not left to model improvisation.

Proof

The approval is not the proof

Every meaningful run should leave a packet that a reviewer, auditor, or incident responder can understand cold.

Where teams go wrong

Most organizations do not fail because they forgot the word governance. They fail because they solve the wrong layer first. They choose a tool before they define evaluation language. They write prompts before they design the runtime contract. They talk about approvals before they check whether approval actually changes execution state.

The safer path is not to slow everything down. It is to sequence the work correctly: know what exists, define what can execute, make deterministic steps explicit, and generate proof by construction.

Start by role

AppSec

Start with control that fails or holds

OpenClaw and the benchmark series are the fastest route if you need runtime control evidence and a sharper evaluation language.

OpenClaw report | Control benchmarks

CISO / Security leadership

Start with approval, ownership, and proof posture

The sprawl report and benchmark series are the best entry point if you need adoption visibility, evidence posture, and pilot language.

Sprawl report | Control benchmarks

Platform

Start with the operating model

The framework series and the Gait and Wrkr collections show how discovery, enforcement, orchestration, and proof fit together.

Operating Notes | Policy Before Action

Where to go next

If you want the measured artifact, start with the research hub. If you want the framework, start with the Operating Notes. If you want a shared vocabulary first, open the glossary.