Home / Field Notes / Operating Notes / Proof of Work for AI-Generated Changes

AI Engineering Operating Notes / Post 9 of 10

Proof of Work for AI-Generated Changes

The proof problem usually shows up after the change looks done. The PR merged, CI is green, and then a reviewer asks what actually happened. A green badge answers almost none of that. Autonomous changes need a compact proof packet that explains what changed, what executed, what was blocked, what remains uncertain, and why promotion happened. The CI/CD control guide covers the delivery path underneath that proof.

In this piece

Where the pressure shows up The failure mode The better pattern What a reviewer should never need to do Why security cares Why platform and engineering care Concrete example: what a proof packet should contain What to do next

Series home | All field notes

The short version

The rule

CI passing is necessary, not sufficient

Autonomous change needs a portable packet that explains scope, policy, execution, validation, and residual risk.

Why it matters

Review quality and audit quality should converge

The same proof packet should help the PR reviewer, incident responder, and auditor understand the run without reconstruction.

Best next step

Score your last autonomous change

Check whether the latest packet explains what changed, what executed, what was blocked, and what remains uncertain.

Where the pressure shows up

Reviewers already struggle to understand complex human-generated pull requests. Autonomous changes raise the bar because the reviewer did not watch the work happen. They inherit a patch, some logs, and whatever summary the system chose to write. If the workflow does not package the run coherently, the reviewer is forced to do incident reconstruction in the code review tool.

That is not sustainable for security, engineering, or platform teams. Security teams need traceable evidence. Engineering and platform teams need machine-readable outputs they can debug. Engineers need enough context to review quickly without losing confidence in the merge path.

If review time rises as autonomous volume rises, proof packaging is usually the missing layer.

The failure mode

The anti-pattern is evidence minimalism: treating the diff and the CI status as if they were enough. That leaves crucial questions open. Why did the system touch these files? Which commands ran? Which tests ran? What did the policy layer allow or block? What remains uncertain? If a reviewer cannot answer those questions without chasing logs, the proof layer is missing.

Teams sometimes call that overhead. It is not overhead. It is the thing that makes autonomous change reviewable at all.

The hidden cost of skipping proof is predictable: slower approvals, more side-channel questions, and weaker incident reconstruction later.

The better pattern

The better pattern is a proof packet attached to every autonomous change. The packet should connect the trigger, plan, patch summary, validation results, policy verdicts, residual risk statement, and promotion decision. The trigger, scope, action, and target fields should also map back to an Agent Action BOM. A reviewer should be able to read it in minutes and understand why the change exists, what the system did, and where human judgment is still required.

This is not only about human comfort. It is what lets the system explain itself later during a dispute, an audit, or a debugging session. Proof packets are the trust layer between "the run completed" and "the organization can defend the outcome."

The rule is straightforward: every autonomous change should carry its own explanation and evidence, not depend on reviewers to assemble them.

What a reviewer should never need to do

A reviewer should never need to reconstruct the workflow from scattered logs, diff through unrelated files just to infer intent, or ask in a side channel which tests actually ran. If the review requires that level of archaeology, the system has shifted the cost of autonomy onto the people meant to trust it.

That is the practical purpose of proof of work. It compresses the run into a defensible unit. The packet does not need to be verbose. It does need to be complete enough that a third party can understand what happened without privileged context sitting in someone's head.

This is also where review quality and audit quality should converge. The best proof packets work for the PR reviewer, the incident responder, and the auditor at the same time. If each audience still needs a different reconstruction, the packet is not finished yet.

Why security cares

Security teams need evidence chains, not just approvals. They need to see what policy applied, what the system attempted, what it was allowed to do, what actually executed, and where residual risk remains. That is how an autonomous change becomes governable rather than merely observable.

This is exactly where the distinction between receipt and proof matters. A workflow can show that a change was authorized and still be unable to show what actually happened. High-trust autonomy needs both records, linked.

Why platform and engineering care

Engineering and platform leaders care because proof packets reduce debugging time and review friction. If the patch summary, validation output, and residual risk statement are structured, a failed run is easier to diagnose and a successful run is easier to merge. The packet becomes a reusable unit of operational knowledge.

It also lets engineering teams move faster with less politics. The reviewer is not being asked to trust the agent. They are being given a compact artifact that makes the review decision legible.

Strong packets also enable better service-level expectations for review. Teams can set faster review targets when the input quality is consistent and complete.

Concrete example: what a proof packet should contain

The exact format can vary. The fields should not.

What changed

Run manifest, scope, affected paths, concise patch summary, and the reasoning or plan that justified the edit.

What ran

Commands, validations, hidden evaluation outcomes, policy verdicts, and proof that the change reached the stated result.

What remains risky

Residual risk, reviewer asks, rollback notes, and the specific reason the workflow promoted or paused the change.

What to do next

Take the last autonomous or semi-autonomous PR your team produced and ask whether a third party could review it cold.

Could they tell what triggered the work?
Could they see what commands and validations ran?
Could they identify the policy posture, approval requirements, and CI/CD control point?
Could they see residual risk without joining a meeting?

If the answer is no, start packaging that data now. Proof should be attached to the change at creation time, not reconstructed after something goes wrong.

The final post steps back from the mechanisms and puts them on a roadmap. Teams move through recognizable stages, and each stage requires different controls, tooling, and organizational posture.