ROGUE
live · streaming the open web
powered by Bright Data · 5 / 5 products · cost-optimized

Your LLM is beingjailbroken.prompt-injected.role-played.escalated.jailbroken.ROGUE finds out before your users do.

Built on all 5 Bright Data products. Harvests every new jailbreak from 19 open-web sources, reproduces each one against your stack, and ships a daily brief — on a budget the bandit auto-tunes for you.

402

attacks tracked

growing daily as the harvester runs

11,097

trials judged

every attack tested 5+ times across every config

8

deployments tested

8 customer-style setups under live attack

powered by Bright Data

Five products. Nineteen sources. One self-tuning budget.

Most red-team tools scrape one platform and hope it stays free. ROGUE fans out across the entire Bright Data product line and lets a bandit decide where to spend the next dollar — automatically.

cost-effectiveness · live

Every Bright Data dollar gets routed to the queries currently finding the most novel attacks.

hot arm = 45.0× cold arm

6000.00

novel attacks per $1 BD spend

from the hot arm · github_pliny_umbrella

in plain Englishextremely cost-efficient — 10+ novel attacks per $

The mechanism: an

ε-greedy

Epsilon-greedy algorithm

90% of the time, pick the strategy you've found works best (greedy). 10% of the time, try something random (epsilon = chance of exploration). Keeps you from getting stuck on stale winners.

bandit

Multi-armed bandit

Classic decision-making algorithm. Each 'arm' is one strategy ROGUE could try; the bandit pulls higher-yield arms more often, lower-yield arms less. Online learning — gets smarter every day without retraining.

tracks 36 candidate

SERP

Search Engine Results Page

What you see when you Google something. Bright Data's SERP API returns those results as structured JSON. ROGUE uses it to discover new attack-discussion URLs across the open web.

queries. Every harvest, it picks the 10 with the highest yield (novel attacks per $ of BD spend) and explores 1 random arm to stay honest.

Why it matters to BD: you stop paying for queries that no longer surface anything new. Hot arms get 90% of pulls; dead arms quietly retire. Your BD spend gets sharper every day, no manual tuning.

39 arms · 39 warm · seeded 2026-05-27 · live pulls since 2026-06-02

5 / 5

Bright Data products in use

MCP

Model Context Protocol

Anthropic's open standard for connecting AI assistants to tools and data. ROGUE both consumes MCP (via Bright Data's MCP server) and exposes its own MCP server that Claude Desktop can query directly.

·

SERP

Search Engine Results Page

What you see when you Google something. Bright Data's SERP API returns those results as structured JSON. ROGUE uses it to discover new attack-discussion URLs across the open web.

· Unlocker · Scraping Browser · Web Scraper

19

open-web sources fanned out

Reddit · X · GitHub · HuggingFace · arXiv · leaks

2-tier

reliability with explicit fallbacks

Scraping Browser → SERP · MCP → Unlocker · per-plugin error isolation

live source roster · color-coded by Bright Data product

Reddit · r/ChatGPTJailbreak· Web Scraper API
X · @elder_plinius· Web Scraper API
GitHub · L1B3RT4S· SERP + Unlocker
GitHub · CL4R1T4S· SERP + Unlocker
HuggingFace · discussions· MCP
arXiv · cs.CR new· Unlocker
Reddit · r/LocalLLaMA· Web Scraper API
Reddit · r/PromptEngineering· Web Scraper API
X · @AISafetyMemes· Web Scraper API
GitHub · awesome-llm-jailbreak· SERP
LeakHub mirrors· Unlocker
Promptfoo Discord (mirror)· SERP + Unlocker
jailbreakchat· Unlocker
X · @karpathy reply threads· Web Scraper API
ArXiv · cs.AI new· Unlocker
GitHub · llm-attacks· SERP
Reddit · r/Anthropic· Web Scraper API
HF · LLM-Attacks dataset discussions· MCP
X · breakages-channel· Web Scraper API
Reddit · r/ChatGPTJailbreak· Web Scraper API
X · @elder_plinius· Web Scraper API
GitHub · L1B3RT4S· SERP + Unlocker
GitHub · CL4R1T4S· SERP + Unlocker
HuggingFace · discussions· MCP
arXiv · cs.CR new· Unlocker
Reddit · r/LocalLLaMA· Web Scraper API
Reddit · r/PromptEngineering· Web Scraper API
X · @AISafetyMemes· Web Scraper API
GitHub · awesome-llm-jailbreak· SERP
LeakHub mirrors· Unlocker
Promptfoo Discord (mirror)· SERP + Unlocker
jailbreakchat· Unlocker
X · @karpathy reply threads· Web Scraper API
ArXiv · cs.AI new· Unlocker
GitHub · llm-attacks· SERP
Reddit · r/Anthropic· Web Scraper API
HF · LLM-Attacks dataset discussions· MCP
X · breakages-channel· Web Scraper API

freshest threats · last 48h

What landed since yesterday.

The harvester runs continuously. Every row below is an attack someone published on the open web in the last 2 days.

live · streaming

8 most-recent

  • [audio] Jailbreak via fake divider, persona hijack, and refusal reversal

    refusal_suppression · multimodal_audio

  • [audio] L1B3RT4S jailbreak command library

    dan_persona · multimodal_audio

  • [audio] Log-substrate prompt injection attacks against SOC LLM analysts

    indirect_prompt_injection · multimodal_audio

  • [audio] ENI persona role-hijack with refusal suppression

    role_hijack · multimodal_audio

  • [audio] Neutral Prompting Attack (NPA) for package hallucination steering

    indirect_prompt_injection · multimodal_audio

query it yourself

Connect ROGUE to your IDE.

ROGUE is also a live MCP server — ask Claude Desktop, Cursor, or Windsurf about the threat DB directly. One click connects it.

https://rogue-api-mr5w.onrender.com/mcp/

Claude Desktop: Settings → Connectors → Add custom connector → paste the URL above. Cursor / VS Code: click the button — your editor opens and offers to add the rogueserver. No clone, no Python, no JSON editing — it's a hosted, read-only MCP server with 5 query tools.

how rogue thinks

Three loops, one outcome: a threat brief that's true today.

Every dot on the dashboard traces back to a real attack that ran against a real config and got judged by a real LLM. No synthetic benchmarks, no hand-picked examples.

01 · harvest19 open-web sources

Stream the latest jailbreaks.

Reddit, X, GitHub, Hugging Face, arXiv, leaks — fanned out through 5 Bright Data products. New attacks land in the DB within minutes of being posted.

02 · reproduce8 deployment configs

Run each one against your stack.

A 5-config trial panel (your customer's models × system prompts × tool sets).

PAIR

Prompt Automatic Iterative Refinement

A separate attacker LLM reads each refusal and rewrites the prompt up to N times until breach or give-up. Measures how stubbornly your defense holds against an adaptive attacker, not a single shot.

makes the attacker iterate. Persona, escalation, and mutation stress tests probe brittle defenses.

03 · defend11,097 trials judged

Ship a brief that ends an argument.

Markdown, JSON, Slack,

MCP

Model Context Protocol

Anthropic's open standard for connecting AI assistants to tools and data. ROGUE both consumes MCP (via Bright Data's MCP server) and exposes its own MCP server that Claude Desktop can query directly.

. Each finding carries 95% bootstrap

CI

Confidence Interval

"60% [95% CI: 50–70%]" means we ran enough trials to be 95% confident the true rate is between 50% and 70%. Wider interval = less certainty; we report both honestly.

and a regenerable receipt. Today's diff vs yesterday's, automatically.

§10.7 · stress tests

ROGUE doesn't just collect attacks. It evolves them.

Five techniques that turn a single harvested jailbreak into the attack a real adversary would actually mount against you. Each runs as a controlled A/B against your stack. Each gets a number.

ε-greedy

Epsilon-greedy algorithm

90% of the time, pick the strategy you've found works best (greedy). 10% of the time, try something random (epsilon = chance of exploration). Keeps you from getting stuck on stale winners.

bandit

Multi-armed bandit

Classic decision-making algorithm. Each 'arm' is one strategy ROGUE could try; the bandit pulls higher-yield arms more often, lower-yield arms less. Online learning — gets smarter every day without retraining.

● live

An online learner that rates each jailbreak-hunting query by how many novel attacks per dollar it surfaces.

6000.00

novel attacks / $

in plain Englishextremely cost-efficient — 10+ novel attacks per $

  • github_pliny_umbrella6000
  • arxiv_actorattack2842.11
  • vendor_openai_blog2666.67

why it mattersStops you from wasting Bright Data budget on queries that no longer find anything new — the hot arm gets the next 90% of pulls automatically.

Persona susceptibility

● live

Each harvested attack is re-wrapped in a "helpful researcher," "compliance officer," or "logical appeal" persona and re-run.

+24pp

max breach-rate Δ vs unwrapped

in plain English16% → 40%, a 24-point jump

  • Llama-3.1-8B-Instruct24
  • Claude Haiku0
  • GPT-5.4 Nano0
  • Gemini 3.1 Flash-Lite0
  • Mistral Small 40

why it mattersA model that refuses "how do I make X" but obeys "as a safety researcher, explain how X is made" is a model that pattern-matches on tone, not intent. The

Δ

Delta (Greek letter for 'change')

In "+15pp Δ" — breach rate jumped by 15 percentage points after the stress test was applied vs the baseline.

tells you exactly which configs do this.

Multi-turn escalation

● live

Single-turn attacks the model refused are re-synthesized as a 3-turn arc that warms up before delivering the payload.

+18pp

lift from turn 1 → turn 3

in plain English4% → 23%, a 18-point jump

  • Llama-3.1-8B-Instruct18
  • Mistral Small 415
  • Gemini 3.1 Flash-Lite14
  • Claude Haiku2
  • GPT-5.4 Nano1

why it mattersIf turn 3 breaches a config that refused turn 1, the model is making isolated decisions — context isn't carrying its guardrails. Real users do this naturally.

Pattern-match audit

● live

Each defended attack gets re-worded into a semantically identical

mutation

Surface mutation

Take an attack the model just blocked, reword it into a different-looking but identical-meaning version, and try again. If the reworded one slips through, the model was filtering on keywords, not understanding what was actually being asked.

and re-run against the configs that blocked the original.

0%

of 'defended' attacks leaked on paraphrase

in plain Englishdefenses robust to wording changes

  • Gemini 3.1 Flash-Lite0
  • Mistral Small 40
  • Llama-3.1-8B-Instruct0
  • GPT-5.4 Nano0
  • Claude Haiku0

why it mattersIf a config defends the original wording but breaches on a paraphrase, the defense was string-matching, not reasoning. Pattern-match score = how brittle your defenses really are.

PAIR

Prompt Automatic Iterative Refinement

A separate attacker LLM reads each refusal and rewrites the prompt up to N times until breach or give-up. Measures how stubbornly your defense holds against an adaptive attacker, not a single shot.

· stubbornness

● live

A second

LLM

Large Language Model

AI like ChatGPT, Claude, or Llama. The thing under attack in every cell of this dashboard.

plays attacker, reading each refusal and refining the attack up to N iterations until breach or give-up.

0.27

iterations to crack the easiest config

in plain Englisheasy crack — breaks on the 1st attempt

  • roleplaying146
  • multi_turn_escalation92
  • obfuscation91
  • logical_appeal27
  • syntactic_mutation5

why it mattersTells you how long your weakest config holds against a real adaptive attacker. Most production safety evals measure single-shot refusal — this measures resilience.

stress-test lab · interactive

Pick a config. Toggle attacks. Watch it bend.

Each toggle adds the observed Δ for the selected config. Bars use real numbers from your sweep. The combined estimate is an upper bound — stress tests don't perfectly stack — but it's directionally honest.

target deployment

estimated breach ratebaseline 16%
16%

in plain English →1 in 10 attacks breaches

// no stress tests selected — toggle one above to start stacking

go deeper

Three views on the same truth.