DLMCourse

Domain LLM

Lessons8modules
Total84mfull study
Quick7mtrailer
Projects8docker labs

Skills you'll gain

10
  • Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrixProduction

    Decide along knowledge-volatility x format-criticality x tone/safety x domain-vocabulary. Defend the call in an ADR with measured numbers — not vibes.

  • Curate domain SFT data with Distilabel + MagpieProduction

    Synthetic instruction generation, judge-LLM filtering, MinHash dedupe, Argilla review. 100 seed prompts → 5K production-grade SFT pairs.

  • Run QLoRA SFT on a 7-8B base via UnslothProduction

    Single-GPU, 4-bit NF4, rank 16 / alpha 32, completion-only loss, 3 epochs. Ship a 50MB adapter to HF Hub. The single highest-leverage 2026 skill.

  • Decide when continued pre-training pays backWorking

    CPT only when the domain has its own vocabulary (legal Latin, ICD-10, rare protein motifs). Quote BloombergGPT's $2.7M cautionary tale; cite the math.

  • Apply DPO / KTO / ORPO for tone & refusal alignmentProduction

    Collect chosen/rejected pairs from real user thumbs-up/down. Train DPO on top of an SFT'd base. A/B vs the SFT'd base — measure tone without losing capability.

  • Apply GRPO for verifiable-reward reasoning fine-tunesAdvanced

    DeepSeek-R1-style RLVR on tasks with executable verification (SQL, math, code). Group size 8, KL beta 0.04. The 2025-2026 frontier reasoning technique.

  • Build a domain eval harness with LLM-as-judge + Inspect AIProduction

    Custom 200-500 golden set, frontier judge model (Claude Opus 4.7 / GPT-5), Inspect AI scoring, HTML report. CI gate on -2pp regression.

  • Serve N LoRA adapters multi-tenant with vLLMProduction

    `vllm serve <base> --enable-lora --max-loras N`. Per-request adapter routing. Locust load test. The 2026 multi-tenant deployment pattern.

  • Ship an on-prem domain assistantAdvanced

    Ollama (merged model) + Qdrant RAG over your own docs + Streamlit/Next.js UI + Prometheus metrics. The deployment regulated industries actually buy.

  • Detect domain drift in productionWorking

    Eval-on-traffic: sample 1% of prod requests, score with a judge LLM, alert on weekly regression. Triggers re-curation + re-tuning loops before users notice.