Why Giant Instruction Files Fail

Large instruction files often look like maturity because they look complete. In review meetings, everyone feels safer when the rulebook is thick. In runtime, that same file usually makes control worse. The agent receives too much irrelevant context, reviewers cannot tell which rules were active, and teams confuse documentation mass with enforceability.

In this piece

Where the pressure shows up The failure mode The better pattern Why security cares Why platform and engineering care Concrete example: detector task context vs everything bagel context What to do next

Series home | All field notes

Where the pressure shows up

Most teams begin with one giant file because they want consistency. They put coding rules, testing rules, product rules, release procedures, domain glossaries, security notes, architectural history, and tool references in a single place. Then every task drags that material into context whether it is useful or not.

The result is predictable. The agent spends effort sorting signal from noise. Maintainers stop trusting that any specific rule will actually be followed. Updates become risky because one edit affects every workflow. Eventually the file stops being a contract and becomes an archive of good intentions.

The operational cost shows up in ways teams can observe: longer runs, more inconsistent behavior, and blurry ownership because no team can safely edit a giant global file without worrying about unrelated breakage.

The failure mode

The anti-pattern is instruction centralization without scoping. Teams assume that if every rule is available everywhere, the system will be safer. In reality, oversized context increases inconsistency. The model has to choose which parts to honor, and reviewers lose the ability to reason about what information should have been active for a specific task.

This also weakens enforceability. A rule buried in a giant file is not the same thing as a rule attached to the part of the repo where it matters. A detector task should not need deploy guidance. A docs task should not inherit operational runbooks. Least-privilege context is as important as least-privilege access.

A common objection is "one file keeps policy consistent." More often, one file keeps policy ambiguous. Consistency comes from clear scope and deterministic enforcement, not from putting every sentence in one place.

The better pattern

The better pattern is layered, conditional context. Keep a small set of global workflow rules at the top. Then attach local guidance to the directories, services, or workflows that actually need it. Let the task determine what loads. That design gives you sharper context, cheaper runs, and clearer ownership of each rule set.

Think about it the way engineering and platform teams already think about config and policy. We do not put every environment variable for every service in one file and call that architecture. We scope config to the place where it applies. Agent instructions should follow the same design principle.

The rule to remember is simple: global rules should be rare and durable. Local rules should be frequent and close to the code they govern.

Why security cares

From a security perspective, giant instruction files blur boundaries. They make it harder to tell which rules were relevant to a specific action and whether a task saw guidance it should never have needed. That matters when you are trying to prove how a risky change was constrained.

Path-scoped rules create a cleaner control story. You can show that a detector task loaded detector docs, detector tests, and detector policies. You can show that it did not load unrelated release instructions or production runbooks. That is a more defensible model than "the whole playbook was somewhere in context."

Why platform and engineering care

Engineering leaders care because scoped context reduces maintenance cost. Smaller instruction sets are easier to update, easier to reason about, and less likely to create accidental regressions in unrelated workflows. It also improves runtime efficiency. The agent spends less time parsing generic prose and more time operating on task-relevant material.

This is where context engineering becomes an architecture problem, not a prompt-writing hobby. The win is not clever wording. The win is shaping the repository and workflow so the right context loads automatically.

It also changes planning quality. Once context is scoped, teams can version and review instruction surfaces like any other interface, instead of treating prompt content as an ungoverned prose artifact.

Concrete example: detector task context vs everything bagel context

A detector change should load only the materials required to make a safe detector change. Anything else is a distraction or a risk.

Task-specific rules

Detector conventions, path ownership, detector tests, and detector docs load because the task touches that subsystem.

Not loaded

Deploy rules, unrelated service docs, incident runbooks, and release mechanics stay out of context because they are not part of the immediate job.

Result

The system sees a narrower problem, a cleaner boundary, and a more auditable set of active instructions.

What to do next

Review your current instruction surface and split it into three buckets.

Global rules that apply everywhere no matter what the task is.
Path-scoped rules that belong only to specific directories or domains.
Deterministic scripts and validators that should replace prose whenever possible.

If a rule can be enforced by code, move it into code. If a rule only matters for one subsystem, scope it to that subsystem. If a document is useful only during release or incident handling, do not make every feature task carry it around.

The next post moves from context architecture to workflow design: where reasoning should stop and deterministic code should take over. That is the handoff that makes agent systems reusable instead of artisanal.