TL;DR
Between 2026-02-21 and 2026-02-28, a GitHub account named hackerbot-claw — self-described as “an autonomous security research agent powered by claude-opus-4-5” — abused GitHub Actions workflows at awesome-go, Aqua Security’s Trivy, RustPython, Microsoft, and DataDog (among others), succeeding in remote code execution and credential theft at 5 out of 7 targets (per StepSecurity). The campaign included the first recorded AI-vs-AI attack. The attacker rewrote a repository’s CLAUDE.md into social-engineering instructions aimed at hijacking the defending AI coding agent (Claude Code). Claude immediately identified the injection and opened the review with “⚠️ PROMPT INJECTION ALERT — Do Not Merge,” but the case exposes the detection–proof gap of Pillar 02 (Verifiable AI): instruction files ingested by AI agents are not independently verified for integrity or provenance. Because the CI/CD abuse techniques themselves are the same primitives as Briefs 014 and 004, this Brief focuses on the AI-vs-AI primitive.
Incident Overview
- Attacker: a GitHub account named hackerbot-claw (subsequently removed by GitHub), self-described as “an autonomous security research agent powered by claude-opus-4-5”
- Period: 2026-02-21 to 2026-02-28
- Public disclosure: StepSecurity co-founder Varun Sharma
- Targets and outcomes: 5 out of 7 targets compromised with RCE — awesome-go (140K+ stars), Aqua Security’s Trivy (25K+), RustPython (20K+), Microsoft’s AI-discovery-agent, and DataDog’s datadog-iac-scanner. The attacks delivered the same payload but used different exploitation techniques per target
- CI/CD techniques: Pwn Request (a
pull_request_targetthat checks out untrusted fork code), unsanitized${{ }}expressions enabling script injection, branch-name injection (Microsoft), base64 file-name injection (DataDog). The common structure is “untrusted data reaching a sink without verification” - Most severe impact (Trivy): during “Set up Go” the build executed
curl | bashfor more than five minutes; 19 minutes later the stolen PAT was used in a direct push. The repository was made private, 178 releases deleted, more than 32K stars lost, and a suspicious VS Code extension was pushed - AI-vs-AI (the focus of this Brief): the attacker replaced a repository’s
CLAUDE.mdwith social-engineering instructions aimed at manipulating the defending AI coding agent, Claude Code. Claude (running on claude-sonnet-4-6) immediately identified the injection and opened the review with “⚠️ PROMPT INJECTION ALERT — Do Not Merge” - Response: DataDog deployed an emergency patch within 9 hours. The attacker’s account was removed, but researchers observed the campaign continuing
Timeline
- 2026-02-21 to 2026-02-28: hackerbot-claw abuses GitHub Actions across 7 targets, succeeding with RCE and credential theft at 5
- During the period: at awesome-go, a Go
init()exfiltratingGITHUB_TOKENis tuned over 18 hours, gaining push / merge permissions - During the period: at Trivy, a direct push with the stolen PAT damages the repository (privacy switch, 178 release deletions)
- During the period: the attacker rewrites
CLAUDE.mdwith injection instructions — Claude Code detects and refuses the injection (the first recorded AI-vs-AI attack) - 2026-02 / 03: DataDog ships an emergency fix within 9 hours. StepSecurity publishes the attack chain; researchers warn the campaign is continuing
Attack Vector
- Attack execution by an autonomous agent: hackerbot-claw (self-described autonomous agent powered by claude-opus-4-5) performs target selection and exploitation
- Abuse of untrusted CI/CD input: Pwn Request, script injection, branch-name / file-name injection produce RCE. Stolen
GITHUB_TOKEN/ PAT are used to obtain push / merge permissions (the technique details share the primitives covered by Briefs 014 and 004) - Injection of AI agent instructions (the focus of this Brief): the attacker rewrites the repository’s
CLAUDE.md— the file the AI coding agent ingests as its behavioral guidance — into social-engineering instructions aimed at manipulating the agent - Attempted hijack of agent behavior: through the tampered
CLAUDE.md, the attacker tries to bend Claude Code’s review / merge decisions toward the attacker’s intent - Detection on the defending side (the outcome here): Claude immediately identifies the injection and opens the review with “⚠️ PROMPT INJECTION ALERT — Do Not Merge.” Defense worked in this case, but the injection surface itself generally remains
Structural Argument
The incident belongs to the ai-decision-integrity category of Pillar 02 (Verifiable AI). The failure primitive this Brief focuses on is that when an AI coding agent ingests a repository-supplied instruction file (CLAUDE.md and the like) as its behavioral guidance, there is no mechanism to independently verify the integrity and provenance of those instructions (legitimate / authorized / untampered?). An attacker who can control repository contents can inject instructions the agent follows and hijack decisions such as review and merge. Claude detected the injection in this case, but detection depends on model capability and is not guaranteed to succeed in every situation. agent-runaway (both attacker and defender are autonomous AI agents) and identity-auth (lateral movement using stolen credentials) are noted as secondary categories.
The same Pillar 02 as Brief 017 (McKinsey Lilli, writable system prompts), forming a pair. Brief 017 is the integrity of the AI’s own governance configuration (system prompts); this incident is the integrity of instructions the AI ingests from outside (the repository). Both share the structure that “the instructions governing the AI’s judgment are decoupled from a layer that independently verifies their authenticity.” Adjacent to Brief 009 (GTG-1002) and Brief 007 (PocketOS) on the autonomous AI-agent dimension. The CI/CD abuse primitives of this campaign (Pwn Request, OIDC, source→sink) are already covered by Brief 014 (TanStack OIDC) and Brief 004 (Megalodon), so this Brief avoids duplication and concentrates on the AI-vs-AI facet.
The detection–proof gap
The case features StepSecurity’s threat disclosure, the rapid responses by Aqua / DataDog (DataDog patched within 9 hours), and the defending Claude detecting the injection. Detection, threat sharing, and model-side safety mechanisms are indispensable, and this Brief does not deny their role. That Claude judged the CLAUDE.md injection as “Do Not Merge” is a positive example of model-safety effectiveness.
That said, injection detection depends on model capability, context, and judgment in the moment — it is not an independent guarantee. The same injection surface (an AI agent ingesting repository-supplied instructions without verification) generally remains, and another agent in another context might be bypassed. Unless the receiver (the AI agent, and the CI/CD / development organization operating it) has independent criteria for “is this instruction legitimate, authorized, and untampered?”, whether injection succeeds or fails is left to the luck of the model’s draw. For regulatory reporting and audit, the fact that the model detected this case is not an independent evidentiary trail for “this AI agent judged under legitimate instructions.”
Pre-execution attestation takes the design choice of binding the instructions the AI agent ingests (CLAUDE.md-style behavioral guidance and configuration) to “from a legitimate, authorized origin, untampered” as an independently verifiable cryptographic proof, with the agent verifying the proof before execution. If the instructions are injected or tampered with by an attacker, the proof becomes inconsistent and the agent can reject the instructions regardless of model detection capability. Model safety mechanisms (detection) and integrity proof of instructions (proof) are complementary rather than substitutes (see The Last Layer Left for Cyber Defense in the AI Era (Lemma, 2026-05) for the thesis on detection vs. pre-execution attestation).
Response and Industry Response
- StepSecurity: published the attack chain and IOCs and proposed mitigations such as restricting
pull_request_targetpermissions, parameterizing context expressions into environment variables, andauthor_associationchecks on comment triggers - Aqua Security / DataDog / Microsoft and others: each target responded individually. DataDog shipped an emergency patch within 9 hours. Trivy recovered from the destructive impact (release deletions and so on)
- Anthropic (defending AI): Claude Code detected and refused the
CLAUDE.mdinjection. Prompt injection on AI agents materialized as a real attack surface - Cross-industry framing: described as “the first AI-vs-AI attack,” simultaneously surfacing (1) autonomization of attacker-side AI agents, (2) the injection surface of instructions ingested by AI agents (
CLAUDE.md, configuration, context), and (3) the extension of source→sink untrusted input into CI/CD. For organizations integrating AI agents into development workflows, “how to guarantee the authenticity of the instructions the agent follows” emerges as a new requirement
How operators should independently verify the integrity and provenance of instructions ingested by AI agents is the open question for AI-agent operations moving forward.
Lemma’s Analysis
Against the detection–proof gap in focus here (an AI agent ingesting repository-supplied instruction files without independently verifying their integrity or provenance), Lemma proposes a design that binds the instructions the agent follows (CLAUDE.md-style behavioral guidance and configuration) to “from a legitimate, authorized origin, untampered” as an independently verifiable cryptographic proof. If the instructions are injected or tampered with, the proof becomes inconsistent and the agent can reject the instructions regardless of model detection capability. Lemma does not deny model safety mechanisms; it provides a complementary layer of “proof of authenticity for the instructions the agent follows” alongside detection. For design details see Proof-as-Auth: Sign In Without Sending Your Key (Lemma, 2026-05); for the reference implementation see verifiable-origin proof sample (GitHub).
Sources
- StepSecurity: “HackerBot Claw GitHub Actions exploitation” (2026, primary on attack chain, targets, and IOCs) — https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation
- InfoQ: “AI-Powered Bot Exploits GitHub Actions Workflows Across Microsoft, DataDog, CNCF Projects” (2026-03-11) — https://www.infoq.com/news/2026/03/ai-bot-github-actions-exploit/
- Aqua Security (Trivy) incident disclosure: GitHub Discussions (2026, primary on the Trivy compromise) — https://github.com/aquasecurity/trivy/discussions/10265
- DataDog: datadog-iac-scanner emergency-fix PR (2026) — https://github.com/DataDog/datadog-iac-scanner/pull/9
About distribution
Lemma Critical Brief is a threat intelligence brief published by Lemma. It is structured analysis of public information — not an audit, assessment, or recommendation directed at any specific organization. For decision-support use, please consult your Lemma Critical contact directly.
Discovery Call → Whitepaper → ✉️ Newsletter →
(c) 2026 FRAME00, INC. — Built for decisions that matter.