SECCourse

AI security & prompt-injection defense

Lessons10modules

Total106mfull study

Quick7mtrailer

Projects8docker labs

CHEATSHEET · 01AI security · master cheatsheet

OWASP LLM Top 10 (2025)

·LLM01 Prompt Injection — direct + indirect (via retrieved docs / pages / OCR / emails / tool outputs).
·LLM02 Sensitive Information Disclosure — PII / secrets in training, RAG, prompts, or responses.
·LLM03 Supply Chain — model artefacts, datasets, plugins, MCP tools.
·LLM04 Data and Model Poisoning — corpus tampering, backdoor triggers, sleeper agents.
·LLM05 Improper Output Handling — XSS via LLM, SQLi via LLM, command injection from output.
·LLM06 Excessive Agency — destructive tools, broad permissions, no human approval.
·LLM07 System Prompt Leakage — leaking secrets / business logic embedded in the system prompt (Policy Puppetry, HiddenLayer Apr 2025).
·LLM08 Vector & Embedding Weaknesses — RAG poisoning, cross-tenant retrieval, embedding inversion.
·LLM09 Misinformation — hallucinations + automation bias.
·LLM10 Unbounded Consumption — token bombs, model extraction, denial-of-wallet.

Defence-in-depth layers (use ALL)

·Rate-limit + abuse-pattern detection at the edge.
·Input classifier (Prompt Guard 2 / Llama Guard 4 / Lakera Guard).
·Spotlighting (Microsoft 2024 paper) — wrap untrusted content in delimiters + explicit 'never follow' rules.
·System-prompt hardening — small, immutable, no secrets.
·Tool allow-list per agent + scoped tokens.
·Output classifier (Llama Guard 4 / Bedrock Guardrails / Azure Content Safety).
·PII redaction in + out (Microsoft Presidio / AWS Comprehend / Azure Cognitive).
·Sandbox tool execution (Daytona ~27-90ms cold start / E2B Firecracker microVMs).
·Audit log (prompt, response, classifier verdicts, tool calls).
·Token + latency budgets (cap unbounded consumption; defeat token-bombs and many-shot).

Guardrails frameworks · 2026

·Llama Guard 4 (Apr 2025, LlamaCon) — open weights, MULTIMODAL (text+image), MLCommons hazards taxonomy.
·Prompt Guard 2 (Meta, Apr 2025) — 86M / 22M; ~30ms latency; injection + jailbreak detection.
·Llama Firewall (Apr 2025) — open-source orchestrator: PromptGuard2 + AlignmentCheck + CodeShield. >90% on AgentDojo.
·NeMo Guardrails 0.20+ (NVIDIA-NeMo/Guardrails) — Colang DSL; jailbreak / topical / RAG / fact-check rails.
·Guardrails AI — RAIL spec, 50+ Pydantic-style validators, on-rails-failure actions.
·Lakera Guard (acquired by Check Point Sep 2025) — managed; injection / PII / toxic / data-leak in one API.
·AWS Bedrock Guardrails — managed; word-list + topic + sensitive-info; Automated Reasoning checks (99% claimed).
·Azure Content Safety + Prompt Shield — managed; PII / injection / jailbreak; Spotlighting deployed in M365 Copilot.
·Google Cloud Model Armor + DLP — managed.
·Protect AI Layer / Guardian (Palo Alto, Jul 2025) — model-supply-chain + runtime; 35+ format scanner.

Red-team toolkit · 2026

·PyRIT (microsoft/PyRIT) — Microsoft AI Red Team's tool since 2022; Foundry integration.
·Garak (NVIDIA) — vuln-scanner with 100+ probes; CI-friendly; Q2 2025 calibration update.
·Promptfoo red-team mode — 50+ vuln types; declarative; OpenAI/Anthropic users per repo.
·Giskard — pytest-style scans for LLM apps.
·DeepTeam (YC W25) — 80+ vuln types; 10+ attack methods; ships 7 production guards.
·Lakera Red — commercial automated attack sims (Check Point, Sep 2025).

Compliance (April 2026)

·NIST AI RMF (Govern · Map · Measure · Manage) + Generative AI Profile (NIST AI 600-1, Jul 2024) — 200+ actions across 12 risk areas.
·NIST AI 100-2 E2025 (Mar 2025 final) — adversarial taxonomy (evasion, poisoning, privacy, abuse, GenAI-specific).
·EU AI Act — prohibited-AI bans live since Feb 2025; GPAI obligations Aug 2025; high-risk Aug 2026; embedded-products Aug 2027.
·ISO/IEC 42001:2023 — first certifiable AI Management System Standard; harmonised pathway to EU AI Act compliance.
·MITRE ATLAS — adversarial-threat matrix for ML (TTP catalogue).

CHEATSHEET · 02Incident response · LLM-app playbook

Detection signals

Triage (first 30 minutes)

·Pull the trace_id from the alert; pull all sibling traces from the same user / IP / tenant.
·Reproduce the prompt against your STAGING gateway — confirm it's exploitable.
·Identify the layer that should have caught it (input classifier? spotlight? tool allow-list? output guard?).
·Decide containment: per-tenant block, per-prompt-shape block, full-feature kill switch.
·Document expected blast radius: what data could the agent have exfiltrated / what tool used.

Containment

·Flip the kill switch (env var DISABLE_LLM=1 or feature flag) — never hot-patch a live model integration.
·Rotate any exposed secrets (system-prompt-leaked API keys, DB creds).
·If model-extraction suspected: rotate API key, audit egress, contact vendor.
·If poisoning suspected: snapshot the corpus + index; freeze ingest until validation rerun.

Eradication & recovery

·Land the failing red-team probe as a permanent test in the CI suite (Garak/PyRIT golden).
·Patch the layer (or add a layer) that should have caught it.
·Re-run all OWASP-LLM-Top-10 tests + the new probe.
·Promote behind canary; watch dashboards for ≥ 24h before going to 100%.
·Post-mortem within 5 business days; include MITRE ATLAS technique IDs.

Compliance touchpoints

·EU AI Act Art. 73: serious-incident reporting for high-risk systems within 15 days (immediate widespread infringement: 2 days).
·EU AI Act Art. 12: high-risk AI systems must keep logs enabling post-hoc monitoring — observability is now law.
·GDPR Art. 33: 72h breach notice when personal data is involved.
·NIST AI RMF Manage 4: maintain incident records + lessons learned.
·ISO/IEC 42001 Clause 10.1: nonconformity + corrective action records.
·Sigstore / SLSA: re-attest model artefacts after any incident touching the model supply chain.