DLMMOD.DLM-08 · v1.0

Tune for your domain.
Bias toward your
accuracy.

8 micro-lessons · ~84 min · Real Docker images

DECK.A · OVERDUB MODE
DECK.A · OVERDUB MODE
DOMAIN OVERLAY · LoRA r=8
SUPPLY
HEAD
TAKE-UP
REC
FINE-TUNE INTENSITY
0.70
DLMAI ENGINEERINGNEW

Domain LLM

Vertical AI that gets paid. Data, SFT, DPO, evals, multi-LoRA serving, on-prem.

WHY THIS MATTERS · KLARNA AI 2024 PRESS · BLOOMBERGGPT (ARXIV 2303.17564) · HARVEY LEGAL-AI · DEEPLEARNING.AI FINETUNING TRACK · STANFORD CS336
The 2026 vertical-AI playbook is now well-defined: GPT-5 / Claude / Qwen3 + RAG + a small SFT LoRA + DPO for tone, served via vLLM `--enable-lora` on a small GPU fleet. Klarna's deployment displaced ~700 FTEs and projected $40M of profit improvement; Harvey raised at $5B. The tooling — Unsloth, TRL, Distilabel, Inspect AI, Ollama — has stabilised. This course is the lifecycle most engineers will execute next quarter.
WHAT YOU'LL LEARN
01RAG vs Fine-tuning vs Hybrid
02Domain data — collection, curation, synthesis
03LoRA / QLoRA / DoRA — the production fine-tune
04Continued pre-training (CPT) — when worth the spend
05Preference tuning — DPO, KTO, GRPO
06Domain evals — golden sets, judge, gates
07Serving — vLLM with LoRA hot-swap
08The on-prem domain assistant
YOU'LL BE ABLE TO
Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrix and defend the choice in an ADR
Curate domain SFT data with Distilabel + Magpie + judge filter + MinHash dedupe
Run a production QLoRA SFT loop on a 7-8B base via Unsloth + TRL
Run DPO / GRPO preference tuning on top of an SFT'd base
Build a 4-layer scorecard harness (generic / domain / golden / hallucination) gating CI
Serve N LoRA adapters multi-tenant via vLLM `--enable-lora`
Ship a fully on-prem domain assistant with Ollama + Qdrant + drift watcher
SKILLS YOU'LL GAIN

Real skills, real career delta.

Skills you'll gain

10
  • Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrixProduction

    Decide along knowledge-volatility x format-criticality x tone/safety x domain-vocabulary. Defend the call in an ADR with measured numbers — not vibes.

  • Curate domain SFT data with Distilabel + MagpieProduction

    Synthetic instruction generation, judge-LLM filtering, MinHash dedupe, Argilla review. 100 seed prompts → 5K production-grade SFT pairs.

  • Run QLoRA SFT on a 7-8B base via UnslothProduction

    Single-GPU, 4-bit NF4, rank 16 / alpha 32, completion-only loss, 3 epochs. Ship a 50MB adapter to HF Hub. The single highest-leverage 2026 skill.

  • Decide when continued pre-training pays backWorking

    CPT only when the domain has its own vocabulary (legal Latin, ICD-10, rare protein motifs). Quote BloombergGPT's $2.7M cautionary tale; cite the math.

  • Apply DPO / KTO / ORPO for tone & refusal alignmentProduction

    Collect chosen/rejected pairs from real user thumbs-up/down. Train DPO on top of an SFT'd base. A/B vs the SFT'd base — measure tone without losing capability.

  • Apply GRPO for verifiable-reward reasoning fine-tunesAdvanced

    DeepSeek-R1-style RLVR on tasks with executable verification (SQL, math, code). Group size 8, KL beta 0.04. The 2025-2026 frontier reasoning technique.

  • Build a domain eval harness with LLM-as-judge + Inspect AIProduction

    Custom 200-500 golden set, frontier judge model (Claude Opus 4.7 / GPT-5), Inspect AI scoring, HTML report. CI gate on -2pp regression.

  • Serve N LoRA adapters multi-tenant with vLLMProduction

    `vllm serve <base> --enable-lora --max-loras N`. Per-request adapter routing. Locust load test. The 2026 multi-tenant deployment pattern.

  • Ship an on-prem domain assistantAdvanced

    Ollama (merged model) + Qdrant RAG over your own docs + Streamlit/Next.js UI + Prometheus metrics. The deployment regulated industries actually buy.

  • Detect domain drift in productionWorking

    Eval-on-traffic: sample 1% of prod requests, score with a judge LLM, alert on weekly regression. Triggers re-curation + re-tuning loops before users notice.

RUNNABLE ON YOUR MACHINE
$ docker pull snap/domain-llm:rag-vs-ft
$ docker run --rm -it snap/domain-llm:rag-vs-ft
snap/domain-llm:rag-vs-ft
QUICK PREVIEW · 7 MIN
VERIFIED ENGINEER REVIEWS
The decision tree (tune/RAG/prompt) saved a quarter.
@fintune_finnVERIFY ON GITHUB
Drift detection lesson is gold for regulated stacks.
@sre_mayaVERIFY ON GITHUB
LESSONS8
HOURS~1.4
LEARNERS1,290
THIS WEEK+27%