Safe fan-out
A docs-only task and a detector-only task claim different paths, no shared dependencies, and can run at the same time.
Independent research and operating notes on AI agent governance.
AI Engineering Operating Notes / Post 7 of 10
Queue volume rises, more agent workers start, and everyone assumes throughput should climb with them. Then the collisions begin: overlapping edits, stale assumptions, retries against invalid state, and review handoffs nobody can merge cleanly. Parallelism is where agent programs either become a platform capability or a coordination crisis.
Most repositories contain work that could run in parallel if the system actually knew what was safe. A docs change and a detector change may not conflict. A frontend refactor and a billing schema migration probably should not run at the same time without coordination. Humans have a rough intuition for these boundaries. Agents need the orchestrator to encode them.
Without that encoding, concurrency produces a familiar mess: duplicate edits, branch conflicts, stale assumptions, wasted retries, and review handoffs nobody can merge cleanly. Teams then conclude that parallel agents are inherently chaotic when the real problem is the missing control layer above them.
The practical signal is simple: queue volume rises, but completed and mergeable work does not rise with it.
The anti-pattern is unmanaged fan-out. An issue queue is full, the tooling can start multiple runs, and teams assume the system should do so aggressively. That treats concurrency as a scheduling question when it is really a claims question: who owns which paths, which tasks depend on others, and what should happen when issue state changes underneath an active run?
If the orchestrator cannot answer those questions, more agents simply means more hidden contention. The queue appears busy while actual delivery quality gets worse.
The sharpest failure mode is stale parallelism: a run keeps going even after upstream issue state changed and invalidated its assumptions.
Safe concurrency starts with explicit claims. Each run should declare what paths it intends to modify, which upstream work it depends on, and which branch or artifact state it assumes. The orchestrator then uses those claims to decide what can run in parallel, what must wait, and what should be cancelled or retried when conditions change.
A dependency DAG is not overkill here.
It is the minimum structure required to let autonomous work scale safely. Once the graph exists, concurrency becomes a policy decision instead of a guess. You can block overlapping claims, reuse workspaces across retries, and re-evaluate the run when a parent issue or dependency changes state.
The rule is straightforward: concurrency should be earned by explicit non-overlap, not assumed by default.
From a security perspective, unmanaged concurrency creates invisible risk. Two changes may each look safe in isolation and become unsafe together. One run may invalidate assumptions another run depends on. If no system owns those dependencies, accountability gets diluted the moment multiple autonomous changes are in flight.
Safe parallelism is not about slowing things down. It is about keeping ownership explicit. A change packet should tell a reviewer not only what the run did, but what else it was allowed to overlap with and why that overlap was considered safe.
This is the real throughput multiplier for engineering. One isolated agent run can be useful. A controlled system that can run independent work items at the same time is where output actually bends upward. That only happens if cancellation, retry backoff, restart recovery, and workspace reuse are designed into the orchestrator instead of improvised by humans.
Platform leaders should think of this as traffic control, not raw compute scaling. The value is not "more runs." The value is more safe work completed per review cycle.
This also improves accountability. When claims and dependencies are explicit, post-incident reviews can explain why overlapping work was allowed, blocked, retried, or cancelled.
Concurrency should be a claim-aware decision, not an optimistic default.
A docs-only task and a detector-only task claim different paths, no shared dependencies, and can run at the same time.
Two tasks claim the same service directory or one task depends on schema changes from another. The orchestrator holds one run.
If the blocking run fails or the issue changes state, the held run is retried or cancelled with a clear reason code.
Before adding more parallel workers, add a concurrency model.
Concurrency is a force multiplier only when the orchestrator owns the claims and the conflict rules.
Otherwise it is a faster path to branch-level confusion.
The next post shifts from scaling work to trusting outcomes. Visible tests matter, but they are not enough when agents can optimize against the exact checks they can see.