Sprawl Series / Post 1 of 4 / AppSec

For AppSec, the First AI Agent Problem Is Evidence Before Exploitation

The quarter-end security review usually starts the same way: "do we have an AI incident problem yet?" In most organizations, the more immediate problem is earlier in the chain. Teams cannot cleanly prove what exists, what is approved, and what evidence will survive an audit or incident memo.

That is why the first AppSec lesson in this report is evidence posture, not exploit theater.

Grounding

Run ID: sprawl-v2-full-20260312b
Subset: 890/1000 completed targets
Core metric: 47.08% without verifiable evidence
Supporting metric: 100% of detected agents missing at least one binding
Core artifact: runs/tool-sprawl/sprawl-v2-full-20260312b/agg/campaign-summary-v2.json

AppSec should separate three questions

AppSec teams usually get pulled into AI governance after a sharper fear shows up: data leakage, code exfiltration, prompt injection, unreviewed write paths, or autonomous abuse. Those are real concerns. But in public-repo posture work, the first measurable control failure often appears earlier in the chain. It is the inability to prove what is present, how it is approved, and whether there is enough evidence to reconstruct intent and boundaries later.

That is what the sprawl report surfaces. Nearly half of the completed target set landed below the `verifiable` evidence threshold. That does not prove active exploitation. It proves a weaker but more common problem: if security leadership asks for a clean answer about AI and agent use, many repositories do not expose enough durable evidence to give one.

A useful AppSec split is: what is present, what is reachable, and what is exercised. Public-repo scanning is strongest on the first question, partial on the second, and weakest on the third. Evidence quality is what makes that gap governable instead of rhetorical.

That should also change where leaders spend first. If the conversation starts only with high-drama exploit scenarios, teams overinvest in edge-case prevention and underinvest in approval records, binding normalization, and evidence continuity. The report supports the opposite priority order.

Why the missing-binding result matters

The subset found `7,984` declared agents and zero binding-complete ones. That should not be read as a curiosity about metadata hygiene. It is a governance signal. If repositories declare agents without exposing tool, data, or auth bindings clearly enough to reconstruct operational boundaries, then every later control conversation gets harder.

AppSec cannot review what it cannot normalize. Threat modeling is weaker. Approval conversations become rhetorical instead of artifact-backed. Incident response starts from ambiguity instead of from durable evidence. That is why the evidence result comes before the exploit question in a public dataset like this one.

Missing bindings should not be treated as a soft documentation issue. They are a review-surface issue. If a repo cannot show which tools, identities, and data contracts make an agent meaningful, security cannot tell whether it is seeing a toy example, a partial workflow, or a real path whose risky details are simply off-screen.

Why public zeroes should not reassure you

It does not mean public repositories are safe because the report did not surface write-capable or exec-capable agents. Public repositories systematically underexpose those paths. The report is not telling AppSec to relax. It is telling AppSec where the public evidence floor actually is.

Absence of visible runtime privilege is not evidence of low internal runtime risk. The report is a visibility and governance-readiness study, not a full runtime exploit census.

That distinction matters because many teams overreact to the wrong number. If a public scan shows zero obvious privileged paths, leadership may infer that the security problem is probably small. In reality the cleaner inference is narrower: the public artifacts do not expose enough runtime detail to prove those paths one way or the other.

The practical AppSec response

The first move is not to chase only the scariest-looking agent declaration. The first move is to force normalization.

That sequence is less dramatic than exploit headlines, but it is the part organizations can operationalize immediately.

A good internal test is simple: if the security team had to write the first paragraph of an incident memo using only repo artifacts, how much could it say without guessing? That is a better measure of current readiness than arguing first about worst-case exploitation.

That test is also low-cost and repeatable, which makes it a better governance KPI than one-time AI scare metrics.