AI Engineering Operating Notes / Post 2 of 10

Your Repository Is the Runtime Contract for Agents

We hear the same sentence in postmortems: "the agent got lost in this repo." In most cases the model did not get lost. The contract was missing. One repository had stable entrypoints, clear boundaries, and truthful docs. Another depended on tribal memory, side-channel rules, and scripts with undocumented side effects. Agents expose that gap faster than humans do.

Where the pressure shows up

Teams often say an agent "does not understand the repo" as if that were a model limitation. More often, it is a contract limitation. The repo assumes undocumented tribal knowledge, validation paths are inconsistent, scripts have side effects nobody wrote down, and the difference between a safe change and a dangerous one lives in human memory.

This matters because the repository is where autonomous software work becomes concrete. It is where the system decides what files exist, what commands are canonical, what tests should run, where constraints live, and what counts as "done." If those answers are fuzzy, you do not have a stable interface. You have a probabilistic scavenger hunt.

Humans can survive that mess by asking in Slack and learning through repetition. Autonomous workflows cannot. They either guess or stall. Both outcomes look like poor model quality when the root cause is repo design debt.

The failure mode

The anti-pattern is to treat repo readiness as a documentation clean-up task instead of an execution contract. Teams add a giant instruction file, maybe a long README, and hope the agent will infer the rest. That still leaves the real operational questions unanswered: Which command is canonical? Which directories are in scope? What fixtures represent important scenarios? Which validations are mandatory? Where should the agent stop and ask for review?

If the answers are not explicit, the agent improvises. Improvisation feels fine when the cost of a mistake is low. It feels much worse when a model invents a build path, edits the wrong subsystem, or chooses a stale test harness because nothing in the repo declared the contract clearly.

The better pattern

Treat the repository like an API for autonomous work. That means creating a small number of reliable surfaces the system can depend on: task guidance in AGENTS.md, product-specific reference docs, deterministic command entrypoints, scenario fixtures, and explicit validation scripts. If the repository makes the right path obvious, the agent stops wasting tokens reconstructing the project from first principles.

The goal is not to explain every corner of the codebase. The goal is to define enough stable interfaces that an agent can operate safely without pretending to be a senior maintainer on day one. In a healthy repo, the best path is the easiest path to discover.

There is a tradeoff here worth naming. A strong contract can feel slower at first because maintainers must write down command paths and domain boundaries that were previously implicit. That work pays back quickly once multiple teams, repositories, and automation paths share the same patterns.

The absence of that contract is usually obvious once you look for it. Setup instructions depend on oral tradition. Risky scripts sit beside safe ones with similar names. Product invariants live in chats instead of docs. Tests only pass when a maintainer remembers hidden sequencing. Humans can work around that mess with judgment. Agents amplify it.

Why security cares

Repo structure is part of the security control surface because it shapes what the system can discover, how it reaches side-effecting commands, and whether the difference between read paths and write paths is visible. If a repository buries sensitive scripts beside routine tooling, or leaves operational boundaries implicit, the agent inherits those ambiguities.

Security teams should read a repo contract the same way they read an API boundary. Where are the privileged actions? Which commands have external effects? Which tests are safe in normal development, and which should only run in isolated environments? A clean repo contract reduces accidental overreach before policy-as-code ever fires.

Why platform and engineering care

Platform leaders care because repo legibility is multiplicative. It reduces human onboarding time and agent onboarding time at the same moment. The same deterministic scripts that make local work predictable also make automated work reusable across teams. The same scenario fixtures that help humans understand edge cases also give agents a concrete test surface.

This is where a tool like Wrkr can make sense as an implementation example, not because the brand matters, but because it assumes a repo with explicit entrypoints, discoverable docs, and stable validation paths. The value does not come from the tool name. It comes from the contract the repo exposes to the tool.

Ownership should be explicit too. If nobody owns repo contract quality, drift is guaranteed. In mature teams this usually sits with platform or code stewardship groups, with AppSec reviewing the boundaries that affect write-capable paths.

Concrete example: a repo contract that agents can actually use

A "Wrkr-style" repository works because each artifact owns a specific part of the interface instead of collapsing everything into one instruction dump.

AGENTS.md

Defines workflow rules, path boundaries, and what an agent should do before changing code or running risky commands.

Product docs

Explain domain behavior, invariants, and scenarios the repo expects contributors to preserve.

Scripts and CI

Provide deterministic entrypoints for build, test, validation, and publish so the "right way" is not ambiguous.

What to do next

Pick one repository that you want agents to operate inside safely. Then review it as if you were designing an interface for a very fast, very literal new hire.

The point is not elegance for its own sake. It is operational leverage. A repo that behaves like an interface can support repeated autonomous work. A repo that behaves like oral tradition cannot.

The next post stays on the same thread and sharpens a common failure: giant instruction files. The issue is not that they are too long. The issue is that they are the wrong abstraction for enforceable, context-efficient work.