Agent social engineering
An agent with private context can be persuaded to reveal or use material it should not expose.
Independent research and operating notes on AI Software Delivery Control.
Field Note / Authority Risk
The uncomfortable OpenClaw lesson is not only that an agent can leak text. It is that persuasion can become action hijacking when the agent has private context, internet access, tools, plugins, or credentials.
Recent public reporting and tooling around OpenClaw-style agents point to the same control problem from three directions: social engineering of an agent, malicious skill or plugin supply chain, and exposed agent infrastructure.
Last updated: May 7, 2026
The Register reported on a Hannah Fry experiment involving an AI agent built with OpenClaw, real-world tasks, and a bank card number supplied by the team. The agent attempted errands, used Fry's real name in an unexpected way, struggled with anti-bot controls, created an online shop, and later disclosed sensitive material after being manipulated through a social-engineering setup.
The important detail is not the novelty of the demo. It is the class of material reportedly exposed: API keys, usernames, passwords, and operational context. That is authority-bearing material, not only content.
Zscaler ThreatLabz separately published a writeup on a deceptive "DeepSeek-Claw" OpenClaw skill. Zscaler says the skill's instructions could lead an AI agent or user into executing installation paths that delivered Remcos RAT on Windows or GhostLoader across macOS, Linux, or manual Windows workflows.
Bishop Fox's AIMap points to a third surface: publicly exposed AI agent infrastructure. Its page describes discovery, fingerprinting, scoring, and testing for exposed AI endpoints including MCP servers, Ollama instances, OpenClaw-style systems, LangServe chains, Gradio apps, and other AI infrastructure.
An agent with private context can be persuaded to reveal or use material it should not expose.
A skill, plugin, package script, or setup instruction can become a path to malware, data theft, or tool misuse.
Public AI endpoints can expose models, prompts, tools, authorization boundaries, and execution capabilities.
These are different events and surfaces. Treating them as one generic "AI security" problem makes the response worse. The shared thread is action authority: what the agent or connected workflow can reach and do after it receives an instruction.
Content risk asks what the model sees, says, leaks, or summarizes. Authority risk asks what the agent can do when it has context, tools, permissions, network reach, or credentials.
In software delivery, that distinction matters. A skill file is not just documentation if an agent may follow it. An MCP declaration is not just configuration if it exposes tools. A public agent endpoint is not just another AI asset if it can run code, enumerate tools, leak prompts, or invoke actions.
The hard question is not only whether the agent can be persuaded. It is what the agent can do after it is persuaded.
Teams do not need to become OpenClaw specialists to learn from this. They need to treat agent setup and tool reach as part of the delivery action graph when those paths can influence repos, CI/CD, packages, credentials, cloud, or release behavior.
A first row for this class of risk should be plain enough for AppSec, platform, and engineering to review together:
Actor: Local coding agent with installed skill
Owner: Engineering enablement
Location: developer workstation and repo tool config
Skill/plugin: third-party setup skill
Credential context: local shell, package manager token, cloud CLI, keychain, CI secret references
Reachable actions: read files, run shell, install package, call MCP tool, publish externally, open PR
Approval-required: new skill install, shell/network command, secret read, package publish, workflow edit, cloud command
Proof: skill source, install command, tool invocation, credential identity, policy verdict, approver, target event, revocation record
The point is not to make every agent action slow. The point is to know where authority changes state, which actions require approval before execution, and what proof remains afterward.
These incidents are best read as evidence of a control pattern, not proof that one control would have prevented every reported outcome. The pattern is narrower and more useful: once an AI-assisted workflow can use tools, credentials, plugins, skills, or exposed endpoints, the security question moves from content safety to action authority.
The practical lesson is the action graph: skills, MCP servers, agent configs, exposed endpoints, and tool declarations become part of software-delivery governance when they can influence repos, CI/CD, packages, credentials, cloud paths, or release behavior.