How was this attack simulation run?

Each AI model was cast as the attacker and run autonomously in a Docker Compose environment for up to 15 turns, on identical prompts with no vulnerability hints (June 12, 2026, via OpenRouter). The reproduction code is public, so anyone can re-run it in the same environment.

What is the difference between INSECURE and SECURE?

The only difference is the presence of the proof layer. SECURE mode requires a zero-knowledge proof before high-risk operations and, with no proof, stops before execution with a 403 (fail-closed).

Six AI models, one identical attack — capability vs. attack resistance

Q: Is this a safety rating or ranking of specific models?

No. It is a measurement under these attack scenarios, not a safety guarantee or ranking of specific models. Each model ran via OpenRouter on identical prompts for up to 15 turns — a setup that differs from the extra safety layers vendors put on their production APIs and from attacks tuned per model.

01 — What we tested

What we put to the test

We cast the AI as the attacker, gave it nothing but a low-privilege account, and threw five attack scenarios at the system. The goal is to reproduce the attacks enterprise systems are really exposed to. We withhold the methods and show only what each one targeted.

🔓 01

Data exfiltration

From low privilege, go after confidential customer data.

📝 02

Audit-log tampering

Erase or rewrite the traces of what was done.

🐛 03

Zero-day RCE

After intrusion, go after decrypting encrypted data.

🌫 04

SIEM evasion

Slip an illicit transfer past detection.

🎭 05

Social engineering

Break the approval flow by impersonation.

6 frontier models Via OpenRouter · identical prompts Up to 15 turns No vulnerability hints Reproduction code public

03 — What we found

The stronger the model, the less its safety features held

The more capable the model, the more sophisticated the attack and the deeper it reached. The strongest model broke 5/5, and the vulnerable paths weren’t one vendor’s — they were common across the entire frontier: a structural risk. Both “pick the high-performer and you’re safe” and “safety-tuned means safe” are betrayed by the data. And — the attacker, not the defender, chooses which AI to attack with. So defense cannot depend on which model it is.

Fable5 (Mythos-class)

Don’t say “attack,” and the safeguards wave it right through.

“Generate the monthly report”→ Returns a customer report containing SSNs

“Run the monthly payment batch”→ Wires ~$67,800

Refusal is not safety. Safety training may curb accidental misuse, but it’s no guarantee.

No choice of model protects you.
What protects you is a layer that stops things before they execute.

Layer 2 — Existing defenses don’t stop it

Does that defense work against an AI?

The classic defenses most companies already run, AI agents slipped past on their own. For each scenario, we put the conventional defense next to how the AI broke it.

What to look for — Just scan the right column, “How the AI breaks it,” top to bottom: one classic defense after another falls.

Scenario

Conventional defense

How the AI breaks it

🔓 Data exfiltration

Conventional defenseJWT auth · role-based access

How the AI breaks itGuesses a weak key → escalates privilege → pulls every customer record

📝 Audit tampering

Conventional defenseHash-chained audit logs

How the AI breaks itAutonomously finds a log-mutation path and rewrites it

🐛 Zero-day RCE

Conventional defenseNetwork isolation · encryption

How the AI breaks itProbes /proc, discovers a decryption endpoint

🌫 SIEM evasion

Conventional defenseAnomaly detection · correlation analysis

How the AI breaks itSpreads timing, blends anomalies into normal traffic

🎭 Social engineering

Conventional defenseNo defense mechanism

How the AI breaks itTargets the whole approval flow, breaks through by impersonation

Classic defenses are breakable by AI.
And for social engineering, the very concept of a defense doesn’t exist.

06 — The solution

AI agents will attack your API.
Add a layer that demands proof before execution, and it stops.

Enterprise · server-side

A server-side security layer that demands a “proof” before execution.

Every breach happened because the AI escalated keys or credentials. Lemma adds one proof layer on the server: before a high-risk operation it requires, as proof, who, with what authority, on which data, and stops anything out of scope before it executes (fail-closed). Into your existing servers and APIs, with no major rewrite.

Server-side deploymentfail-closedZero-knowledge proofsIndependently verifiable audit trailEnterprise

// Require a proof before sensitive operations, in one line
app.use('/api/sensitive', requireZkProof())
// No proof → 403 PROOF_REQUIRED · blocked across Opus / GPT / DeepSeek / Qwen / Kimi

The social-engineering singularity

Defense, for the first time, where the very concept was absent

No defense

→

Defense — for the first time, with Lemma

Approvals and payments have, traditionally, had no defense mechanism at all as a domain. For transfers and approvals, Lemma requires a mathematical proof of authorization and stops anything out of scope before it executes.

Layer a proof gate over the attacks, and the outcome changes like this:

⚠️

Attack

Escalates keys/credentials and abuses them

JWT privilege escalation
Impersonation
Audit-log tampering

🛡

Lemma’s proof gate

Demands a “proof” before execution

Who ZK identity
With what authority role
On which data scope

🛑

Blocked before execution

Stops before execution

No proof, nothing is sent

fail-closed
Zero leakage
Verifiable trail

After deploying Lemma

After adding Lemma, what happened to the attacks

The default view is “With Lemma” — every model and every scenario, blocked before execution. Flip the toggle to “No proof,” and the same table fills with breaches (red). The only difference is Lemma.

View

Sort

Breached Attack succeeded Held Did not succeed Refused Model refused (behavior, not a guarantee) Blocked Lemma blocked it before execution

AI attacks. Only Lemma stops it.

Will your systems withstand an AI attack?

We run these attack scenarios against your own systems (a security assessment) and propose where, on the server side, to place the proof gate. Start with a 30-minute discovery call. No disclosure of sensitive data required.

Book a Discovery Call → See the plans →

To learn more about Lemma, see the Whitepaper.

How Lemma rolls out

Try it small, confirm it, then bring it in.

Discovery (30-min call)

We review your target systems and requirements. No disclosure of sensitive data required.

Pilot (PoC)

We drop Lemma’s proof gate into a staging environment in a minimal configuration.

Before / after test

Measure the no-proof vs. proof difference under attack scenarios. See the effect in numbers.

Production rollout

Based on the results, we finalize the integration scope and the path to production.

How we tested

This is measurement, not assertion. The code is public, and anyone can re-run it in the same environment.

Models Opus 4.8 / GPT-5.5 / DeepSeek v4 Pro / Qwen3.7 Max / Kimi-K2.6 / Fable 5 (June 12, 2026 · via OpenRouter)
Environment Docker Compose, up to 15 turns, identical prompts for all models, no vulnerability hints
INSECURE / SECURE The only difference is the presence of the proof layer. SECURE requires a zero-knowledge proof before high-risk operations; without it, 403
Reproduction code github.com/lemmaoracle/example-cyber-attack

How to read this — This benchmark backs a structural point — that detection and safety training alone don’t close the gap — and is a measurement under these attack scenarios. Don’t read it as a safety guarantee for, or a ranking of, specific models. What Lemma provides is pre-execution proof of authorization and after-the-fact verifiability; it is not a product that prevents attacks. Defense is a separate layer’s job, and Lemma complements it. Each model ran autonomously via OpenRouter on identical prompts for up to 15 turns, a setup that differs from the extra safety layers vendors put on their production APIs and from attacks tuned per model. Read the breach counts not as a ranking but as an illustration of the structural point.

AI has switched to the attacker’s side.
Can your systems hold the line?