The Dark Factory: Managing Work Instead of Supervising Agents

The failure does not look dramatic at first. More agent runs start, more work items enter the queue, and status gets harder to explain. Then reviewers cannot tell what is blocked, what retried, what died halfway through, or what is ready for decision. That is the real scaling wall. The issue is not model quality. It is whether the organization has a work system.

In this piece

Where the pressure shows up The failure mode The better pattern What review should look like Why security cares Why platform and engineering care Concrete example: from agent-ready issue to review handoff What to do next

Series home | All field notes

Where the pressure shows up

Most current agent usage still depends on direct supervision. A person starts the session, steers the work, reruns failed commands, and manually decides when the output is good enough to open a PR. That can work for experiments. It breaks down when one team tries to scale output across dozens of repos, queues, and issue types.

The hard part is not agent intelligence. It is work management. Which tasks are ready? Which ones depend on other work? Which paths can run in parallel? When should the system retry, pause, or route to human review?

If nobody owns those questions centrally, "agentic scale" quickly becomes queue theater. Lots of runs move. Very little is legible.

Executives usually feel this as queue uncertainty: status meetings get longer while delivery predictability gets worse. If the system cannot answer where work is stuck and why, the organization is not managing autonomous work, it is observing it.

The failure mode

The anti-pattern is agent supervision as a scaling strategy. Teams imagine they can get more leverage by adding more agent sessions while leaving work intake, state management, retries, approvals, and reconciliation implicit. That usually creates invisible queues, hidden failures, duplicate effort, and broken accountability.

You can spot the pattern quickly. Nobody can answer how many runs are active, which tasks were retried, what was cancelled when issue state changed, or what evidence should accompany a promoted change. The work exists, but the operating system does not.

The better pattern

The better pattern starts by treating the issue queue as an operating system, not a pile of prompts. A work item becomes a run manifest with an owner, scope, dependencies, path claims, retry rules, and promotion evidence before execution begins.

That is the dark factory pattern CAISI uses in this series: an orchestrated system that turns those manifests into bounded execution runs with explicit states. The orchestrator allocates a workspace, selects a blueprint, captures dependencies, runs validation, packages evidence, and decides whether to open a PR or wait for human review.

In a mature setup, the orchestrator is not a convenience layer. It is the control surface. It owns retries, approvals, backoff, conflict detection, and restart recovery. Humans still matter, but they review the work at the right points instead of babysitting every intermediate step.

The key tradeoff is explicit. You invest more in run-state design upfront so you spend less on coordination, exception churn, and manual triage later.

What review should look like

The dark factory does not remove humans from software delivery. It changes the shape of their work. Reviewers stop acting as traffic coordinators and start acting as decision makers at the right control points.

Instead of supervising every command, they evaluate a coherent review packet: what the run tried to do, what the policy layer allowed, what validations passed, and where residual risk remains.

That matters because it preserves engineering judgment where it is most valuable. Humans are still best at exception handling, tradeoffs, ambiguous product decisions, and final accountability. The factory removes the low-value mechanical supervision that burns time and hides the real decision moments.

It also forces better exception design. A serious dark factory needs a visible needs decision state, a narrow break-glass path, and a clear owner for blocked or stalled runs. Without those states, teams recreate the old supervision model in side channels and call it autonomy anyway.

This is where CISO and platform leadership interests align. Both need fewer ambiguous handoffs. Both need decision points that are explicit, time bounded, and attributable.

Why security cares

The dark factory creates a governable approval surface. Security teams do not need to supervise every keystroke if the system can prove what entered the queue, which policies applied, what the workspace could access, what validations ran, and what state the run reached before a PR opened. That is a much stronger posture than ad hoc agent sessions distributed across laptops and shared branches.

It also improves rollback and dispute handling. If a run has a stable manifest, workspace, and artifact packet, the organization can stop arguing about what "probably" happened. It can inspect the run state.

Why platform and engineering care

Engineering and platform leaders care because this is how one team scales output without linearly scaling human supervision. The orchestrator absorbs coordination work that would otherwise live in engineers' heads. It knows how to resume a task after a transient failure, block unsafe concurrency, or reuse a warm workspace on retry.

This is also where Wrkr can be a useful implementation example. The value is not "an agent that codes." The value is a system that can turn work into managed runs, isolate execution, and hand humans a reviewable result instead of a mystery patch.

Capacity planning also improves. Once work states are explicit, engineering leaders can forecast review load, blocked runs, and retry pressure instead of managing by intuition.

Concrete example: from `agent-ready` issue to review handoff

The dark factory is easier to understand when the work item becomes the unit of motion.

Manifest

An issue marked agent-ready becomes a run manifest with scope, path claims, blueprint, and approval posture.

Execution

The orchestrator allocates a workspace, runs the change, records validation output, and manages retries or human holds.

Handoff

A PR opens with patch summary, evidence packet, residual risk, and reviewer guidance instead of a vague "please inspect."

What to do next

Look at your current issue-to-PR flow and ask whether the system or the humans own the work states.

Define the run states explicitly: queued, planning, executing, blocked, needs review, validated, shipped, failed.
Attach path claims, dependencies, and approval posture to the work item before execution begins; use an Agent Action BOM for the action-path fields.
Decide what evidence must exist before a PR can be promoted, including the CI/CD proof path.
Make cancellation and retry behavior explicit instead of informal.

The goal is not "fully autonomous everything." The goal is to define states, owners, retry behavior, and promotion evidence before adding more agent volume.

The next post looks at the infrastructure this factory needs under the hood: isolated, warm sandboxes. Without them, orchestration turns into contention and cleanup pain.