For AppSec, Evidence Comes Before AI Agent Exploitation

The quarter-end security review usually starts the same way: "do we have an AI incident problem yet?" In most organizations, the more immediate problem is earlier in the chain. Teams cannot cleanly prove what exists, what is approved, and what evidence will survive an audit or incident memo.

That is why the first AppSec lesson in this report is evidence posture, not exploit theater.

In this piece

Grounding AppSec should separate three questions Why the missing-binding result matters Why public zeroes should not reassure you The practical AppSec response

Series home | All field notes

Grounding

Run ID: sprawl-v2-top250-20260508a
Cohort: 250 locked public targets
Core metric: 54.4% without verifiable evidence
Supporting metric: 100% of detected agents missing at least one binding
Core artifact: runs/tool-sprawl/sprawl-v2-top250-20260508a/agg/campaign-summary-v2.json
Deterministic queries: jq '.campaign.metrics.orgs_without_verifiable_evidence_pct', jq '.campaign.metrics.agents_missing_bindings_pct'

AppSec should separate three questions

AppSec teams usually get pulled into AI governance after a sharper fear shows up: data leakage, code exfiltration, prompt injection, unreviewed write paths, or autonomous abuse. Those are real concerns. But in public-repo posture work, the first measurable control failure often appears earlier in the chain. It is the inability to prove what is present, how it is approved, and whether there is enough evidence to reconstruct intent and boundaries later.

That is what the sprawl report surfaces. More than half of the locked target set landed below the verifiable evidence threshold. That does not prove active exploitation. It proves a weaker but more common problem: if security leadership asks for a clean answer about AI and agent use, many repositories do not expose enough durable evidence to give one.

A useful AppSec split is: what is present, what is reachable, and what is exercised. Public-repo scanning is strongest on the first question, partial on the second, and weakest on the third. Evidence quality is what makes that gap governable instead of rhetorical.

That should also change where leaders spend first. If the conversation starts only with high-drama exploit scenarios, teams overinvest in edge-case prevention and underinvest in approval records, binding normalization, and evidence continuity. The report supports the opposite priority order.

Why the missing-binding result matters

The cohort found 2,222 declared agents and zero binding-complete ones. That should not be read as a curiosity about metadata hygiene. It is a governance signal. If repositories declare agents without exposing tool, data, or auth bindings clearly enough to reconstruct operational boundaries, then every later control conversation gets harder.

AppSec cannot review what it cannot normalize. Threat modeling is weaker. Approval conversations become rhetorical instead of artifact-backed. Incident response starts from ambiguity instead of from durable evidence. That is why the evidence result comes before the exploit question in a public dataset like this one.

Missing bindings should not be treated as a soft documentation issue. They are a review-surface issue. If a repo cannot show which tools, identities, and data contracts make an agent meaningful, security cannot tell whether it is seeing a toy example, a partial workflow, or a real path whose risky details are simply off-screen.

Why public zeroes should not reassure you

It does not mean public repositories are safe because the report surfaced only limited write-capable and exec-capable agents. The report cannot observe many internal write, credential, or deployment paths. It is not telling AppSec to relax. It is telling AppSec where the public evidence floor actually is.

Absence of visible runtime privilege is not evidence of low internal runtime risk. The report is a visibility and governance-readiness study, not a full runtime exploit census.

That distinction matters because many teams overreact to the wrong number. If a public scan shows zero obvious privileged paths, leadership may infer that the security problem is probably small. In reality the cleaner inference is narrower: the public artifacts do not expose enough runtime detail to prove those paths one way or the other.

The practical AppSec response

The first move is not to chase only the scariest-looking agent declaration. The first move is to force normalization.

Separate declaration inventory from deployable-agent evidence in policy and reporting, then map high-impact paths into an Agent Action BOM.
Require machine-readable approval records for AI tools in repo and CI surfaces, especially where the CI/CD control path is involved.
Treat missing bindings as a reviewable security finding with owner and due date.
Use public discovery as a starting layer, then evaluate internal runtime paths separately.
Define incident evidence requirements now, before the first high-pressure event.

That sequence is less dramatic than exploit headlines, but it is the part organizations can operationalize immediately.

A good internal test is simple: if the security team had to write the first paragraph of an incident memo using only repo artifacts, how much could it say without guessing? That is a better measure of current readiness than arguing first about worst-case exploitation.

That test is also low-cost and repeatable, which makes it a better governance KPI than one-time AI scare metrics.