Chaos Monkey for AI Agents
WatchLLM

Run the red team before production runs you.

Wire one decorator into your agent entrypoint, then let WatchLLM fire prompt injection, goal hijacking, memory poisoning, tool abuse, boundary testing, and jailbreak variants against the real execution path. You ship with failure reports, not crossed fingers.

@chaos(key="sk_proj_xxx")Real tool calls. Real traces. Real failure modes.
Installation CostOne decorator
Attack Surface6 locked failure classes
Exit ConditionAutopsy report before deploy
attack session / support_agent_v2target fingerprint 41d2:af90 - live decorator capture
Trace ActiveArmed
Prompt InjectionGoal HijackingMemory PoisoningTool AbuseBoundary TestingJailbreak VariantsPrompt InjectionGoal HijackingMemory PoisoningTool AbuseBoundary TestingJailbreak Variants
Attack Surface

Targeted failures, not random prompt theater.

Random prompts create noise. WatchLLM organizes chaos around the exact breakage modes that take real agents down in production: objective drift, poisoned state, unsafe tool calls, and policy collapse under pressure.

01 / Injection

Prompt Injection

Override attempts that try to rewrite system intent, leak hidden context, or smuggle new operating rules into the conversation.

Instruction Override
02 / Steering

Goal Hijacking

Multi-turn adversarial steering that slowly drags the agent away from the declared task and toward a hostile objective.

Multi-Turn Drift
03 / State

Memory Poisoning

False facts, poisoned summaries, and corrupted recall paths that turn yesterday's bad context into tomorrow's confident mistake.

Persistent Corruption
04 / Tools

Tool Abuse

Dangerous tool invocations, destructive parameters, and function-call chains that look valid until they touch money, data, or production systems.

Unsafe Execution
05 / Scope

Boundary Testing

Edge-case pressure against the agent's stated remit, where vague ownership and overloaded policies usually begin to fracture.

Edge Conditions
06 / Escape

Jailbreak Variants

Roleplay, encoding tricks, hypothetical framing, and other evasive tactics built to punch through brittle refusal patterns.

Policy Erosion
How It Works

Three moves between laptop confidence and production confidence.

The flow is intentionally short. Capture the real agent path, select the classes of failure that matter, then read the autopsy before the first customer ever stumbles into it.

01 / Wire Decorator

Attach To The Real Entry Point

Drop the decorator on the agent you are already shipping. WatchLLM intercepts the actual model path, tool registry, and system behavior without forcing an SDK migration.

@chaos(key="sk_proj_xxx") def support_agent(input: str) -> str:
02 / Define Scenarios

Select The Failure Classes Worth Hurting For

Load the attack library that matches your risk surface. Prompt injection and jailbreaks are table stakes; tool abuse and memory poisoning are where production agents get expensive.

attacks = [ "prompt_injection", "tool_abuse", "memory_poisoning" ]
03 / Get Failure Reports

Read The Autopsy, Gate The Release

Every compromised run returns the trace, the failed category, and the reason it broke. Set a severity threshold and make unsafe agents fail the build before users ever touch them.

watchllm test my_agent.py --fail-on "severity>=4"

Built for engineers shipping agents with tools, memory, and real blast radius. No fake demos. No eval theater. Just adversarial pressure against the path that would actually run in production.