The first head-to-head benchmark of AI agent security providers โ open, reproducible, and fair
537 test cases across 8 categories. Over-refusal penalty: (FPR^1.3) ร 40 โ security that breaks usability isn't security.
| # | Provider | Score | Penalty | PI | Jailbreak | Data Exfil | Tool Abuse | Over-Refusal | Multi-Agent | Provenance | P50 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AgentGuardTrustless ProtocolProvenance-based (proprietary) | 98.4 | 0.00 | 98.5% | 97.8% | 100.0% | 100.0% | 100.0% | 100.0% | 85.0% | 1ms |
| 2 | Deepset DeBERTaML model (local) | 87.6 | โ10.95 | 99.5% | 97.8% | 95.4% | 98.8% | 63.1% | 100.0% | 100.0% | 19ms |
| 3 | Lakera GuardML + rules (SaaS) | 79.4 | โ12.77 | 97.6% | 95.6% | 96.6% | 86.3% | 58.5% | 94.3% | 95.0% | 133ms |
| 4 | ProtectAI DeBERTa v2ML model (local) | 51.4 | โ0.73 | 77.1% | 86.7% | 43.7% | 12.5% | 95.4% | 74.3% | 65.0% | 19ms |
| 5 | ClawGuardPattern-based (local) | 38.9 | 0.00 | 62.9% | 22.2% | 40.2% | 17.5% | 100.0% | 40.0% | 25.0% | 0ms |
| 6 | LLM GuardML model (Docker) | 38.7 | โ | 77.1% | โ | 30.8% | 8.9% | โ | โ | โ | 111ms |
Transparent, reproducible, and designed to reward balanced security.
Overall score is a weighted geometric mean across categories. This rewards balanced performance โ scoring 90 everywhere beats 100 on some and 50 on others.
Blocking legitimate requests is penalized: (FPR^1.3) ร 40. A provider blocking 50% of legit requests loses ~16 points. Security that breaks usability isn't security.
Sub-50ms p95 scores 100. Over 1 second scores 5. Speed matters in production agentic systems where tool-calling timeouts are real.
Corpus hashed per run. All results include environment, config, and raw per-test-case outcomes. Anyone can verify independently.
537 test cases across attack detection, false positive control, performance, and provenance.
Direct, indirect, and context-manipulation attacks. Includes delimiter escaping, multi-turn escalation, MCP hijacking, unicode steganography, and encoded payloads.
DAN variants, roleplay exploits, authority impersonation, token smuggling, crescendo attacks, and multi-language bypass attempts.
Leaking data via tool calls, markdown images, error messages, steganographic encoding, and side channels. Tests both direct extraction and covert exfiltration.
Unauthorized tool calls, privilege escalation, parameter injection, scope expansion, recursive loops, and resource exhaustion attacks.
Legitimate requests that should NOT be blocked: cybersecurity education, medical/legal topics, creative writing, historical events, and multi-language benign inputs.
Cross-agent injection propagation, delegation abuse, trust boundary violations, context poisoning, and orchestrator impersonation.
Added latency measured across every test case. P50, P95, and P99 percentiles. Scored inversely โ faster is better.
Detecting fake authorization claims, spoofed A2A handoffs, fabricated HMAC/JWT tokens, and unverifiable approval chains. Can the provider tell real authority from claimed authority?
How proprietary solutions participate without revealing their implementation.
Our Trustless Protocol lets vendors benchmark locally while cryptographically proving results are legitimate. No model weights revealed, no API access needed โ just math.
Prevents cherry-picking (random subset), result tampering (hash chain), model swapping (commitment), and forgery (Ed25519). Full protocol documentation โ
Building an AI agent security tool? We'll benchmark it โ independently, transparently, against the full 537-case corpus.