Manifest
An issue marked agent-ready becomes a run manifest
with scope, path claims, blueprint, and approval posture.
Independent research and operating notes on AI agent governance.
AI Engineering Operating Notes / Post 5 of 10
The failure does not look dramatic at first. More agent runs start, more work items enter the queue, and status gets harder to explain. Then reviewers cannot tell what is blocked, what retried, what died halfway through, or what is ready for decision. That is the real scaling wall. The issue is not model quality. It is whether the organization has a work system.
Most current agent usage still depends on direct supervision. A person starts the session, steers the work, reruns failed commands, and manually decides when the output is good enough to open a PR. That can work for experiments. It breaks down when one team tries to scale output across dozens of repos, queues, and issue types.
The hard part is not agent intelligence. It is work management. Which tasks are ready? Which ones depend on other work? Which paths can run in parallel? When should the system retry, pause, or route to human review?
If nobody owns those questions centrally, "agentic scale" quickly becomes queue theater. Lots of runs move. Very little is legible.
Executives usually feel this as queue uncertainty: status meetings get longer while delivery predictability gets worse. If the system cannot answer where work is stuck and why, the organization is not managing autonomous work, it is observing it.
The anti-pattern is agent supervision as a scaling strategy. Teams imagine they can get more leverage by adding more agent sessions while leaving work intake, state management, retries, approvals, and reconciliation implicit. That usually creates invisible queues, hidden failures, duplicate effort, and broken accountability.
You can spot the pattern quickly. Nobody can answer how many runs are active, which tasks were retried, what was cancelled when issue state changed, or what evidence should accompany a promoted change. The work exists, but the operating system does not.
The better pattern is the dark factory: an orchestrated system that turns work items into bounded execution runs with explicit states. A task becomes a manifest. The manifest allocates a workspace, selects a blueprint, captures dependencies, defines path claims, runs validation, packages evidence, and decides whether to open a PR or wait for human review.
In a mature setup, the orchestrator is not a convenience layer.
It is the control surface. It owns retries, approvals, backoff, conflict detection, and restart recovery. Humans still matter, but they review the work at the right points instead of babysitting every intermediate step.
The key tradeoff is explicit. You invest more in run-state design upfront so you spend less on coordination, exception churn, and manual triage later.
The dark factory does not remove humans from software delivery. It changes the shape of their work. Reviewers stop acting as traffic coordinators and start acting as decision makers at the right control points.
Instead of supervising every command, they evaluate a coherent review packet: what the run tried to do, what the policy layer allowed, what validations passed, and where residual risk remains.
That matters because it preserves engineering judgment where it is most valuable. Humans are still best at exception handling, tradeoffs, ambiguous product decisions, and final accountability. The factory removes the low-value mechanical supervision that burns time and hides the real decision moments.
It also forces better exception design. A serious dark factory needs a visible needs decision state, a narrow break-glass path, and a clear owner for blocked or stalled runs. Without those states, teams recreate the old supervision model in side channels and call it autonomy anyway.
This is where CISO and VP Platform interests align. Both need fewer ambiguous handoffs. Both need decision points that are explicit, time bounded, and attributable.
The dark factory creates a governable approval surface. Security teams do not need to supervise every keystroke if the system can prove what entered the queue, which policies applied, what the workspace could access, what validations ran, and what state the run reached before a PR opened. That is a much stronger posture than ad hoc agent sessions distributed across laptops and shared branches.
It also improves rollback and dispute handling. If a run has a stable manifest, workspace, and artifact packet, the organization can stop arguing about what "probably" happened. It can inspect the run state.
Platform leaders care because this is how one team scales output without linearly scaling human supervision. The orchestrator absorbs coordination work that would otherwise live in engineers' heads. It knows how to resume a task after a transient failure, block unsafe concurrency, or reuse a warm workspace on retry.
This is also where Wrkr can be a useful implementation example. The value is not "an agent that codes." The value is a system that can turn work into managed runs, isolate execution, and hand humans a reviewable result instead of a mystery patch.
Capacity planning also improves. Once work states are explicit, engineering leaders can forecast review load, blocked runs, and retry pressure instead of managing by intuition.
agent-ready issue to review handoffThe dark factory is easier to understand when the work item becomes the unit of motion.
An issue marked agent-ready becomes a run manifest
with scope, path claims, blueprint, and approval posture.
The orchestrator allocates a workspace, runs the change, records validation output, and manages retries or human holds.
A PR opens with patch summary, evidence packet, residual risk, and reviewer guidance instead of a vague "please inspect."
Look at your current issue-to-PR flow and ask whether the system or the humans own the work states.
The goal is not "fully autonomous everything." The goal is a managed, observable work system that can absorb more autonomy without becoming opaque.
The next post looks at the infrastructure this factory needs under the hood: isolated, warm sandboxes. Without them, orchestration turns into contention and cleanup pain.