GOVCourse

AI Governance & Evaluation

Lessons8modules
Total88mfull study
Quick7mtrailer
Projects8docker labs

Skills you'll gain

10
  • Eval Harness OperatorProduction

    Stand up Inspect AI, lm-evaluation-harness, HELM, Promptfoo and DeepEval in CI; run a 30-task suite against any chat endpoint and ship the JSON + HTML report.

  • Capability Benchmark SpecialistWorking

    Read SWE-bench Verified, GPQA-Diamond, FrontierMath, ARC-AGI-2, MMLU-Pro and Aider-polyglot leaderboards critically; pick the right benchmark for the claim you're making.

  • RAG Eval EngineerProduction

    Build Ragas / TruLens / DeepEval pipelines that score faithfulness, answer-relevance and context-precision/recall on a versioned golden RAG dataset; gate releases on grounding regressions.

  • Red-Team OperatorProduction

    Run automated jailbreak campaigns with PyRIT and Garak; map findings to MITRE ATLAS and OWASP LLM Top 10; produce a defensible red-team report.

  • Safety Benchmark AuditorProduction

    Run MLCommons AILuminate, HarmBench, JailbreakBench and AgentHarm; produce letter-graded safety reports defensible to procurement and frontier-launch reviewers.

  • Eval StatisticianWorking

    Apply bootstrap CIs, paired-permutation, McNemar, Cohen's kappa, Krippendorff's alpha, Bonferroni / Holm / BH corrections; never trust a single number again.

  • Continuous Eval CI/CD EngineerProduction

    Wire Phoenix, Langfuse, OpenLLMetry into prod LLM apps; capture spans, replay through golden datasets, alert on drift; run eval as a GitHub Action that posts a delta table to every PR.

  • Frontier Safety EvaluatorAdvanced

    Implement RSP / Preparedness / FSF tier-gating: METR autonomy time-horizon, cyber CTF uplift proxy, AI R&D uplift; produce a traffic-light gate document.

  • Human Eval LeadWorking

    Stand up Argilla / Label Studio for SME annotation; design rubrics; compute IRR; integrate human-graded results into the same dashboard as automatic metrics.

  • Eval Report AuthorProduction

    Author model cards, system cards, transparency notes and Annex IV technical-documentation packs; map results to NIST AI 600-1 sub-controls and EU AI Act Art. 15 declarations.