Augustus by Praetorian: How Red Team Scanners Break LLMs and What It Means for Business

Praetorian has released Augustus, an open-source vulnerability scanner designed for Red Teaming LLMs. It automates over 210 attacks, including jailbreaks and prompt injections, supporting even local Ollama instances. For businesses, this tool is vital: without proper testing, deployed models remain vulnerable to data leaks and manipulation by adversaries.

Technical Context

Augustus is an open-source tool from Praetorian for Red Teaming large language models: essentially a "vulnerability scanner" for LLMs that runs the model through a massive set of attacks and automatically records where defenses fail. Crucially, it is written in Go and distributed as a portable binary, lowering the entry barrier for many teams compared to heavy Python stacks.

From a practical standpoint, the news isn't just about "yet another tool," but the fact that Augustus is focused on operator scenarios and reproducibility—exactly what is needed when auditing solutions before a real launch (or when investigating an incident).

What It Actually Does

Automates 210+ attacks (probes) across 47 categories: prompt injection, jailbreak, extraction (pulling hidden data), filter bypasses, toxicity/NSFW, malicious content generation, RAG poisoning attempts, and more.
Supports 28 providers, including Ollama for local instances (relevant for companies that keep everything "on-prem" and assume they are automatically safe).
Parallel runs with load control: rate limiting, retries, timeouts—allowing for managed testing rather than just crashing the model with requests.
Result detection via 90+ detectors: ranging from pattern matching to LLM-as-judge and external evaluators (schemes similar to HarmJudge/Perspective API are used).
Reporting: export to JSON/JSONL/HTML—easy to plug into analysis pipelines, CI/CD, or internal audit processes.

Key Concept: "Buff" Transformations to Bypass Fragile Defenses

A distinct strength of Augustus is its chain of "Buff" transformations: the tool doesn't just send a direct jailbreak but attempts to mask the payload. In real attacks, this often makes all the difference because many "guardrails" rely on superficial signatures.

paraphrasing,
case/style changes,
poetic form/encoding,
translation into low-resource languages (e.g., Zulu),
simple encodings (like base64) and context manipulations.

In practice, this means: if your defense is based on blocklists and primitive classifiers, it will crumble during the first systematic test. That is why the original announcement sensibly recommends running Augustus in Docker/sandbox—the tool generates attack prompts, can provoke dangerous responses, and create unstable model states.

Engineering Constraints and Caveats

No published efficiency metrics (percentage of successful bypasses on specific models/versions). This means it should be used as a verification and regression tool, not a "universal security rating."
Low community validation (few forks/discussions)—increases requirements for isolation and internal verification, especially if you integrate reports into corporate processes.
Risk of side effects: when testing on a local Ollama, if the API is accidentally exposed to the network, you are effectively giving yourself a "local exploit bench" with a chance of leakage, data poisoning (in RAG), or at least resource DoS.

Business & Automation Impact

For business, Augustus highlights an uncomfortable reality: "we deployed a local model, so it's safe" does not hold true. Even a local LLM integrated into processes remains attackable via the prompt surface and context (RAG, tools, agents). And if the model is linked to actions (creating tickets, sending emails, modifying ERP/CRM records, generating documents), an LLM vulnerability becomes a business process vulnerability.

How This Changes AI Solution Architecture

Red Teaming becomes a mandatory SDLC stage for LLM solutions: before pilot, before prod, and upon every significant change (model/prompt/policies/RAG sources).
Shift of focus from "model" to "system": you must protect not just the model's response, but the context, tools, routing, logs, access rights, and action post-validation.
Measurable controls are needed: policies (what is forbidden), detectors (how we catch it), reactions (what we do), regression (how not to degrade upon update).

Who Wins and Who Is at Risk

Winners: Teams that implement AI systematically: with threat modeling, roles, a test environment, and logging. For them, Augustus is a QA accelerator.
At Risk: Companies building AI automation "on the fly": adding an LLM to support chat, connecting a knowledge base, granting access to internal documents—and thinking that's enough. Augustus will quickly show that access can be extracted or the model forced to violate rules.

Typical Risk Scenarios Augustus Helps Reveal

Prompt injection in RAG: an attacker slips an instruction into a document/page/ticket such that the model starts ignoring system rules and "leaks" data.
Data extraction: attempts to pull confidential fragments from the context (PII, contract numbers, internal regulations, keywords from private docs).
Content policy bypass: if the model is used to generate text/code/instructions, "forbidden" content can appear bypassing the filter via transformations.
Agent/tool abuse: when an LLM can call tools (HTTP, email, CRM), a risk arises of forcing the model into unwanted actions. Even if Augustus doesn't fully cover your agentic stack, it disciplines the testing approach.

At the management level: LLM security is not a "provider feature." It is part of the AI solution architecture in the company: segmentation, secrets, contexts, tool limitations, "two-circuit" designs (safe-mode), and the inevitable observability layer.

And here companies often hit a practical wall: the tool exists, attacks exist, reports exist—but it's unclear how to turn this into changes in a production system without breaking UX and KPIs. It is at this intersection (security + business process + operations) that professional AI integration is most often needed, rather than just "running a scanner."

Expert Opinion: Vadym Nahornyi

The most dangerous illusion in corporate LLMs is thinking that "guardrails" equal security. Augustus is good because it quickly brings the team back to reality: most default defenses are a thin layer that breaks under a combination of paraphrasing, linguistic tricks, and context manipulations.

At Nahornyi AI Lab, we regularly see a similar pattern: a company invests in AI implementation, connects a knowledge base and ticket/document automation, but builds no full threat perimeter. Then the first "innocent" incident appears: the model outputs a fragment of an internal document to an inappropriate channel, generates a bypass instruction, or accepts a harmful instruction from RAG as a priority. Augustus helps detect such things before they become a reputational and legal case.

Where the Real Utility Is vs. Hype

Utility: as a regression security test during changes. Updated the model in Ollama, changed the system prompt, added a new source to RAG—run Augustus and compare reports.
Hype: trying to reduce security to a single number "SAFE/VULN" without context. Access boundaries, use cases, and consequences (impact) matter, not just the fact of a filter bypass.

Typical Implementation Mistakes

Testing the model but not the system: in reality, the attack comes through data, integrations, and actions.
Launching without a sandbox: best case—resource overload; worst case—leaks/index poisoning/logging of dangerous content.
Not fixing the baseline: no "baseline" report, no comparison between releases, no security acceptance criteria.

My forecast: in 2026, Red Teaming for LLMs will become a de facto standard for companies where LLMs influence decisions and operations. Tools like Augustus will accelerate this transition, but the winners will be those who can embed testing into the product lifecycle and link it to risk management, rather than a one-off "tick-box" check.

Theory is good, but results require practice. If you are implementing LLMs in support, sales, document management, production, or internal assistants and want to make AI automation safe and manageable—let's discuss your threat perimeter, test strategy, and architecture. The Nahornyi AI Lab team will help build verifiable protection and operations, and Vadym Nahornyi is responsible for the quality of architectural decisions and the final business effect.

Share this article

Twitter/X LinkedIn Telegram