SECCourse

AI security & prompt-injection defense

Lessons10modules
Total106mfull study
Quick7mtrailer
Projects8docker labs
CHEATSHEET · 01AI security · master cheatsheet
OWASP LLM Top 10 (2025)
  • ·LLM01 Prompt Injection — direct + indirect (via retrieved docs / pages / OCR / emails / tool outputs).
  • ·LLM02 Sensitive Information Disclosure — PII / secrets in training, RAG, prompts, or responses.
  • ·LLM03 Supply Chain — model artefacts, datasets, plugins, MCP tools.
  • ·LLM04 Data and Model Poisoning — corpus tampering, backdoor triggers, sleeper agents.
  • ·LLM05 Improper Output Handling — XSS via LLM, SQLi via LLM, command injection from output.
  • ·LLM06 Excessive Agency — destructive tools, broad permissions, no human approval.
  • ·LLM07 System Prompt Leakage — leaking secrets / business logic embedded in the system prompt (Policy Puppetry, HiddenLayer Apr 2025).
  • ·LLM08 Vector & Embedding Weaknesses — RAG poisoning, cross-tenant retrieval, embedding inversion.
  • ·LLM09 Misinformation — hallucinations + automation bias.
  • ·LLM10 Unbounded Consumption — token bombs, model extraction, denial-of-wallet.
Defence-in-depth layers (use ALL)
  • ·Rate-limit + abuse-pattern detection at the edge.
  • ·Input classifier (Prompt Guard 2 / Llama Guard 4 / Lakera Guard).
  • ·Spotlighting (Microsoft 2024 paper) — wrap untrusted content in delimiters + explicit 'never follow' rules.
  • ·System-prompt hardening — small, immutable, no secrets.
  • ·Tool allow-list per agent + scoped tokens.
  • ·Output classifier (Llama Guard 4 / Bedrock Guardrails / Azure Content Safety).
  • ·PII redaction in + out (Microsoft Presidio / AWS Comprehend / Azure Cognitive).
  • ·Sandbox tool execution (Daytona ~27-90ms cold start / E2B Firecracker microVMs).
  • ·Audit log (prompt, response, classifier verdicts, tool calls).
  • ·Token + latency budgets (cap unbounded consumption; defeat token-bombs and many-shot).
Guardrails frameworks · 2026
  • ·Llama Guard 4 (Apr 2025, LlamaCon) — open weights, MULTIMODAL (text+image), MLCommons hazards taxonomy.
  • ·Prompt Guard 2 (Meta, Apr 2025) — 86M / 22M; ~30ms latency; injection + jailbreak detection.
  • ·Llama Firewall (Apr 2025) — open-source orchestrator: PromptGuard2 + AlignmentCheck + CodeShield. >90% on AgentDojo.
  • ·NeMo Guardrails 0.20+ (NVIDIA-NeMo/Guardrails) — Colang DSL; jailbreak / topical / RAG / fact-check rails.
  • ·Guardrails AI — RAIL spec, 50+ Pydantic-style validators, on-rails-failure actions.
  • ·Lakera Guard (acquired by Check Point Sep 2025) — managed; injection / PII / toxic / data-leak in one API.
  • ·AWS Bedrock Guardrails — managed; word-list + topic + sensitive-info; Automated Reasoning checks (99% claimed).
  • ·Azure Content Safety + Prompt Shield — managed; PII / injection / jailbreak; Spotlighting deployed in M365 Copilot.
  • ·Google Cloud Model Armor + DLP — managed.
  • ·Protect AI Layer / Guardian (Palo Alto, Jul 2025) — model-supply-chain + runtime; 35+ format scanner.
Red-team toolkit · 2026
  • ·PyRIT (microsoft/PyRIT) — Microsoft AI Red Team's tool since 2022; Foundry integration.
  • ·Garak (NVIDIA) — vuln-scanner with 100+ probes; CI-friendly; Q2 2025 calibration update.
  • ·Promptfoo red-team mode — 50+ vuln types; declarative; OpenAI/Anthropic users per repo.
  • ·Giskard — pytest-style scans for LLM apps.
  • ·DeepTeam (YC W25) — 80+ vuln types; 10+ attack methods; ships 7 production guards.
  • ·Lakera Red — commercial automated attack sims (Check Point, Sep 2025).
Compliance (April 2026)
  • ·NIST AI RMF (Govern · Map · Measure · Manage) + Generative AI Profile (NIST AI 600-1, Jul 2024) — 200+ actions across 12 risk areas.
  • ·NIST AI 100-2 E2025 (Mar 2025 final) — adversarial taxonomy (evasion, poisoning, privacy, abuse, GenAI-specific).
  • ·EU AI Act — prohibited-AI bans live since Feb 2025; GPAI obligations Aug 2025; high-risk Aug 2026; embedded-products Aug 2027.
  • ·ISO/IEC 42001:2023 — first certifiable AI Management System Standard; harmonised pathway to EU AI Act compliance.
  • ·MITRE ATLAS — adversarial-threat matrix for ML (TTP catalogue).
CHEATSHEET · 02Incident response · LLM-app playbook
Detection signals
  • ·Repeated input-classifier blocks from a single user / IP / tenant.
  • ·Output-classifier spikes (sudden surge of 'unsafe' verdicts).
  • ·Token-bomb patterns (single query > 10× normal cost; recursive tool loops).
  • ·Tool-call anomalies (new tool combinations, off-allow-list attempts).
  • ·Audit-log gaps (a request without a response — dropped on the safety layer).
  • ·High-entropy prompts (compressed gibberish often = adversarial suffix).
Triage (first 30 minutes)
  • ·Pull the trace_id from the alert; pull all sibling traces from the same user / IP / tenant.
  • ·Reproduce the prompt against your STAGING gateway — confirm it's exploitable.
  • ·Identify the layer that should have caught it (input classifier? spotlight? tool allow-list? output guard?).
  • ·Decide containment: per-tenant block, per-prompt-shape block, full-feature kill switch.
  • ·Document expected blast radius: what data could the agent have exfiltrated / what tool used.
Containment
  • ·Flip the kill switch (env var DISABLE_LLM=1 or feature flag) — never hot-patch a live model integration.
  • ·Rotate any exposed secrets (system-prompt-leaked API keys, DB creds).
  • ·If model-extraction suspected: rotate API key, audit egress, contact vendor.
  • ·If poisoning suspected: snapshot the corpus + index; freeze ingest until validation rerun.
Eradication & recovery
  • ·Land the failing red-team probe as a permanent test in the CI suite (Garak/PyRIT golden).
  • ·Patch the layer (or add a layer) that should have caught it.
  • ·Re-run all OWASP-LLM-Top-10 tests + the new probe.
  • ·Promote behind canary; watch dashboards for ≥ 24h before going to 100%.
  • ·Post-mortem within 5 business days; include MITRE ATLAS technique IDs.
Compliance touchpoints
  • ·EU AI Act Art. 73: serious-incident reporting for high-risk systems within 15 days (immediate widespread infringement: 2 days).
  • ·EU AI Act Art. 12: high-risk AI systems must keep logs enabling post-hoc monitoring — observability is now law.
  • ·GDPR Art. 33: 72h breach notice when personal data is involved.
  • ·NIST AI RMF Manage 4: maintain incident records + lessons learned.
  • ·ISO/IEC 42001 Clause 10.1: nonconformity + corrective action records.
  • ·Sigstore / SLSA: re-attest model artefacts after any incident touching the model supply chain.