live · streaming the open web

scraper-agnostic harvest · keyless by default · cost-optimized

Find every way your AI agentbreaks.gets rubber-stamped.leaks a poisoned skill.goes wrong.breaks.ROGUE finds out before your users do.

One engine measures all three: whether the model can be broken, whether the human sign-off is real, and whether the skills your agents share are safe, against an independent, continuously-refreshed standard. Every result is a signed, reproducible record.

Point ROGUE at your endpoint. Get a report of where it can go wrong across all three surfaces, and how to fix it.

Maturity is honest: the model surface is mature and scannable today; the human gate (live) and the agent’s memory (in research) are measured and research-validated, signed but small-n. See all three →

646

attacks tracked

more than most security teams see in a quarter

38,311

trials judged

every attack tested 5+ times across every config

deployments tested

29 customer-style setups under live attack

Watch a scan run →See a sample report See what's breaching → /matrix

beyond the model

Three surfaces where a high-stakes agent goes wrong — one engine, every result signed.

Red-teaming the model is one surface. ROGUE also measures the human who approves a risky action and the skill pool your agents share, and signs every result against a provably-independent answer key.

the model · offense

Reproduce real jailbreaks.

Open-web attacks replayed against your exact model, system prompt, and tools, ranked worst-first.

the human gate · oversight

Measure the sign-off.

When a risky action escalates to a person, a false-approve rate against an independent key, so “a human approved it” is a measured control, not an assumption.

the agent’s memory · assurance

Audit the skill pool.

Shared agent skills, checked for leakage, whether they actually help, and dangerous combinations, before they spread.

See all three on the product tour →

scraper-agnostic harvest

Keyless by default. Fifteen sources. One self-tuning budget.

Most red-team tools scrape one platform and hope it stays free. ROGUE’s harvest is scraper-agnostic and keyless out of the box, any scraper or proxy slots in behind one env var, and a bandit decides where to spend the next dollar, automatically.

cost-effectiveness · live

Every harvest dollar gets routed to the queries currently finding the most novel attacks.

5151.52

novel attacks per $1 harvest spend

from the hot arm · github_pliny_umbrella

in plain Englishextremely cost-efficient, 10+ novel attacks per $

The mechanism: an

ε-greedy

Epsilon-greedy algorithm

90% of the time, pick the strategy you've found works best (greedy). 10% of the time, try something random (epsilon = chance of exploration). Keeps you from getting stuck on stale winners.

tracks 36 candidate queries. Every harvest, it picks the 10 with the highest yield (novel attacks per $ of harvest spend) and explores 1 random arm to stay honest.

Why it matters: you stop paying for queries that no longer surface anything new. Hot arms get 90% of pulls; dead arms quietly retire. Your harvest spend gets sharper every day, no manual tuning.

52 arms · 52 warm · seeded 2026-05-27 · live pulls since 2026-07-18

default harvest cost

crawl4ai · keyless Firecrawl · DuckDuckGo · direct · any proxy slots in

open-web sources fanned out

Reddit · X · GitHub · HuggingFace · arXiv · leaks

2-tier

reliability with explicit fallbacks

render → search · MCP → fetch · per-plugin error isolation

live source roster · color-coded by source backend

hover to pause

Reddit · r/ChatGPTJailbreak· scrape

X · @elder_plinius· scrape

GitHub · L1B3RT4S· search + fetch

GitHub · CL4R1T4S· search + fetch

HuggingFace · discussions· MCP

arXiv · cs.CR new· fetch

Reddit · r/LocalLLaMA· scrape

Reddit · r/PromptEngineering· scrape

X · @AISafetyMemes· scrape

GitHub · awesome-llm-jailbreak· search

LeakHub mirrors· fetch

Promptfoo Discord (mirror)· search + fetch

jailbreakchat· fetch

X · @karpathy reply threads· scrape

ArXiv · cs.AI new· fetch

GitHub · llm-attacks· search

Reddit · r/Anthropic· scrape

HF · LLM-Attacks dataset discussions· MCP

X · breakages-channel· scrape

Reddit · r/ChatGPTJailbreak· scrape

X · @elder_plinius· scrape

GitHub · L1B3RT4S· search + fetch

GitHub · CL4R1T4S· search + fetch

HuggingFace · discussions· MCP

arXiv · cs.CR new· fetch

Reddit · r/LocalLLaMA· scrape

Reddit · r/PromptEngineering· scrape

X · @AISafetyMemes· scrape

GitHub · awesome-llm-jailbreak· search

LeakHub mirrors· fetch

Promptfoo Discord (mirror)· search + fetch

jailbreakchat· fetch

X · @karpathy reply threads· scrape

ArXiv · cs.AI new· fetch

GitHub · llm-attacks· search

Reddit · r/Anthropic· scrape

HF · LLM-Attacks dataset discussions· MCP

X · breakages-channel· scrape

freshest threats · last 48h

What landed since yesterday.

The harvester runs continuously. Every row below is an attack someone published on the open web in the last 2 days.

live · streaming

1 most-recent

Nested URL exfiltration via web_fetch tool output hijacking
indirect_prompt_injection · tool_outputfetch
↗

your stack · at a glance

breach matrix · 2026-07-13

open →

2 families × 1 configs0% peak · 0 crit

How it works

From endpoint to filed ticket in four steps

Point ROGUE at a model, escalate every goal through the full arsenal, and let the criticals route themselves to your tracker.

01
Connect your endpoint
Point ROGUE at any OpenAI-compatible endpoint.
OpenAIAnthropicGeminiCustom API
02
Run a ladder scan
Escalate each goal through the full arsenal, graduated techniques, multi-turn, multimodal.
Direct
Graduated
Multi-turn
Multimodal
03
Find the jailbreak
ROGUE surfaces exactly which attacks break it, with evidence.
Crescendo·CRITICAL·breached 4/5
04
Ticket auto-filed
Criticals flow straight to Jira/Slack via the MCP integration.
ROGUE-142Critical
Crescendo bypass
filed via MCP → Jira

see the product

This is what you get back.

A scored executive report you can hand to a CISO, and a live MCP session that runs the whole red-team from inside your editor.

app.rogue · scans / scan_8f3a2 / report

← /scans/scan_8f3a2

acme-support-assistant · claude-sonnet-4-6

142 tests · 11 breaches

Example report · illustrative

68/100

Risk score

high

Top attack

Multi-turn escalation (Crescendo)

Score = severity-weighted breach density across 142 reproduced attack trials. Bands: ≥75 critical · ≥50 high · ≥25 medium · <25 low.

Executive summary

This deployment breached 11 of 142 attack trials (7.7%). Highest risk: multi-turn escalation (Crescendo) and a multimodal image-carrier bypass. Two CRITICAL findings require attention before production.

Tests

142

Breaches

Breach rate

Cost

$4.12

Recommendations

Add a turn-aware refusal check: Crescendo succeeds by degrading the guard across a benign-looking conversation. Re-evaluate the safety policy on every turn against the full transcript, not just the latest message.
Run OCR + a vision safety pass on all uploaded images before they reach the model. The image-carrier bypass smuggles instructions as rendered text that the text-only filter never sees.
Constrain tool exposure for untrusted turns, the two CRITICAL findings chained a jailbreak into a tool call. Gate high-impact tools behind a confirmation step until the conversation is re-verified.

Cursor, rogue-mcp

MCP · rogue

Scan my staging endpoint for jailbreaks and file the criticals to Jira.

On it. I'll validate the target, run a ladder-mode red-team through ROGUE's MCP server, then file Jira tickets for the critical findings.

validate_target(endpoint="staging-api…/v1")

← { reachable: true, authenticated: true, ok: true }

start_scan(pack="default", mode="ladder", max_tests=150)

← { scan_id: "scan_8f3a2", status: "queued" }

get_scan_status("scan_8f3a2")running

← { status: "running", progress: 68% } → polling…

list_findings("scan_8f3a2")

← 11 breaches across 142 trials · 2 critical, 4 high

create_jira_ticket(integration="jira-prod")

← { created: ["SEC-412", "SEC-413"], skipped: [] }

Scan complete , 142 trials, 11 breaches (7.7%). Top risk: Crescendo (CRITICAL, 4/5). Filed 2 Jira tickets for the criticals; full report at app.rogue/scans/scan_8f3a2/report.

Illustrative MCP session, ROGUE is the MCP server

Take the full product tour →

how rogue thinks

Three loops, one outcome: a threat brief that's true today.

Every dot on the dashboard traces back to a real attack that ran against a real config and got judged by a real LLM. No synthetic benchmarks, no hand-picked examples.

01 · harvest19 open-web sources

Stream the latest jailbreaks.

Reddit, X, GitHub, Hugging Face, arXiv, leaks, fanned out across 15 open-web sources. New attacks land in the DB within minutes of being posted.

02 · reproduce29 deployment configs

Run each one against your stack.

A 5-config trial panel (your customer's models × system prompts × tool sets).

PAIR

Prompt Automatic Iterative Refinement

A separate attacker LLM reads each refusal and rewrites the prompt up to N times until breach or give-up. Measures how stubbornly your defense holds against an adaptive attacker, not a single shot.

makes the attacker iterate. Persona, escalation, and mutation stress tests probe brittle defenses.

03 · defend38,311 trials judged

Ship a brief that ends an argument.

Markdown, JSON, Slack,

MCP

Model Context Protocol

Anthropic's open standard for connecting AI assistants to tools and data. ROGUE both consumes MCP (from an external data-provider's MCP server) and exposes its own MCP server that Claude Desktop can query directly.

. Each finding carries 95% bootstrap and a regenerable receipt. Today's diff vs yesterday's, automatically.

→ 646 primitives

§10.7 · stress tests

ROGUE doesn't just collect attacks. It evolves them.

Five techniques that turn a single harvested jailbreak into the attack a real adversary would actually mount against you. Each runs as a controlled A/B against your stack. Each gets a number.

ε-greedy

Epsilon-greedy algorithm

90% of the time, pick the strategy you've found works best (greedy). 10% of the time, try something random (epsilon = chance of exploration). Keeps you from getting stuck on stale winners.

● live

An online learner that rates each jailbreak-hunting query by how many novel attacks per dollar it surfaces.

5151.52

novel attacks / $

in plain Englishextremely cost-efficient, 10+ novel attacks per $

github_pliny_umbrella5151.52
blog_etr_after3142.86
arxiv_jailbreak_llm1875

why it mattersStops you from wasting harvest budget on queries that no longer find anything new, the hot arm gets the next 90% of pulls automatically.

Persona susceptibility

● live

Each harvested attack is re-wrapped in a "helpful researcher," "compliance officer," or "logical appeal" persona and re-run.

+45pp

max breach-rate Δ vs unwrapped

in plain English11% → 56%, a 45-point jump

Llama-3.1-8B-Instruct45
GPT-5.4 Nano43
Claude Haiku0
Gemini 3.1 Flash-Lite0
Mistral Small 40

why it mattersA model that refuses "how do I make X" but obeys "as a safety researcher, explain how X is made" is a model that pattern-matches on tone, not intent. The

Delta (Greek letter for 'change')

In "+15pp Δ", breach rate jumped by 15 percentage points after the stress test was applied vs the baseline.

tells you exactly which configs do this.

Multi-turn escalation

● live

Single-turn attacks the model refused are re-synthesized as a 3-turn arc that warms up before delivering the payload.

+35pp

lift from turn 1 → turn 3

in plain English0% → 35%, a 35-point jump

fl-GLM-4-9B-041435
Voxtral Small 24B30
fl-Qwen3-32B14
Gemini 3.1 Flash-Lite12
Llama-3.1-8B-Instruct7

why it mattersIf turn 3 breaches a config that refused turn 1, the model is making isolated decisions, context isn't carrying its guardrails. Real users do this naturally.

Pattern-match audit

● live

Each defended attack gets re-worded into a semantically identical

mutation

Surface mutation

Take an attack the model just blocked, reword it into a different-looking but identical-meaning version, and try again. If the reworded one slips through, the model was filtering on keywords, not understanding what was actually being asked.

and re-run against the configs that blocked the original.

13%

of 'defended' attacks leaked on paraphrase

in plain Englisha small fraction leak on paraphrase

GPT-5.4 Nano13
Llama-3.1-8B-Instruct5
frontier-gpt54nano3
GPT-5.4 Nano0
Mistral Small 40

why it mattersIf a config defends the original wording but breaches on a paraphrase, the defense was string-matching, not reasoning. Pattern-match score = how brittle your defenses really are.

PAIR

Prompt Automatic Iterative Refinement

A separate attacker LLM reads each refusal and rewrites the prompt up to N times until breach or give-up. Measures how stubbornly your defense holds against an adaptive attacker, not a single shot.

· stubbornness

● live

A second

LLM

Large Language Model

AI like ChatGPT, Claude, or Llama. The thing under attack in every cell of this dashboard.

plays attacker, reading each refusal and refining the attack up to N iterations until breach or give-up.

0.27

iterations to crack the easiest config

in plain Englisheasy crack, breaks on the 1st attempt

roleplaying146
multi_turn_escalation92
obfuscation91
logical_appeal27
syntactic_mutation5

why it mattersTells you how long your weakest config holds against a real adaptive attacker. Most production safety evals measure single-shot refusal, this measures resilience.

go deeper

Three views on the same truth.

/feed

Live Feed

Newest attacks with the 5-stress-test sidebar. Click any row to see the full payload + breach trail, including the new ▶ play replay.

/matrix

Breach Matrix

2 attack families × 29 configs. Click any red cell to see the exact prompt that cracked it, with 95% CIs.

/brief

Threat Brief

Today's CISO-readable diff vs yesterday. Markdown + JSON exports. The artifact you'd actually send.

get started

See where your agent goes wrong.

Watch ROGUE run a scan right now, no signup — the model that can be broken, the human sign-off that may be rubber-stamped, the skills that can leak — every result a signed, reproducible record. Then read the research, or scan your own model — live, no install.

Scan your model Watch the demo

Read the research See a sample report

Read the research

Find every way your AI agentbreaks.gets rubber-stamped.leaks a poisoned skill.goes wrong.breaks.ROGUE finds out before your users do.

Three surfaces where a high-stakes agent goes wrong — one engine, every result signed.

Keyless by default. Fifteen sources. One self-tuning budget.

Every harvest dollar gets routed to the queries currently finding the most novel attacks.

What landed since yesterday.

From endpoint to filed ticket in four steps

Connect your endpoint

Run a ladder scan

Find the jailbreak

Ticket auto-filed

This is what you get back.

acme-support-assistant · claude-sonnet-4-6

Executive summary

Recommendations

Three loops, one outcome: a threat brief that's true today.

Stream the latest jailbreaks.

Run each one against your stack.

Ship a brief that ends an argument.

ROGUE doesn't just collect attacks. It evolves them.

Three views on the same truth.

See where your agent goes wrong.