LLM Red-Team Reference

LLM jailbreaks and prompt-injection attacks (OWASP LLM Top 10) sit at the top of the AI-security risk register: simple to execute, hard to defend, and amplified when the model has tools, document access, or agentic privileges. This catalogue covers 12 documented technique families — from direct prompt injection and adversarial-suffix attacks to many-shot, crescendo, and skeleton-key — each with an illustrative pattern, layered defences, and academic references.

Examples are templates drawn from public research and vendor reports, not weaponised payloads. Use this on systems you own or have written authorisation to test — internal AI products, your own RAG application, or a CTF.

Defender-oriented reference. This catalogue documents publicly-known attack patterns to help you red-team and defend the LLMs you build, deploy, or audit. Examples are illustrative templates from peer-reviewed research and vendor reports — not fully-weaponised payloads. Use only on systems you own or have explicit written permission to test.

12 techniques

How to use this

Pre-launch red-team

Run each technique against your model with realistic system prompts and tools before launch. Track which categories your model is robust against vs. which still need mitigation.

Defence checklist

Each entry lists layered defences (input filtering, prompt hardening, output classification, tool-use mediation). Stack the relevant ones — no single layer catches every variant.

Pentest deliverables

Map findings to category names + references when reporting to clients. Cite the academic source so the defender team has a starting point for remediation research.