DLMMOD.DLM-08 · v1.0

Tune for your domain.
Bias toward your
accuracy.

8 micro-lessons · ~84 min · Real Docker images

DECK.A · OVERDUB MODE

DOMAIN OVERLAY · LoRA r=8

SUPPLY

HEAD

TAKE-UP

REC

FINE-TUNE INTENSITY

0.70

DLMAI ENGINEERINGNEW

Domain LLM

Vertical AI that gets paid. Data, SFT, DPO, evals, multi-LoRA serving, on-prem.

WHY THIS MATTERS · KLARNA AI 2024 PRESS · BLOOMBERGGPT (ARXIV 2303.17564) · HARVEY LEGAL-AI · DEEPLEARNING.AI FINETUNING TRACK · STANFORD CS336

The 2026 vertical-AI playbook is now well-defined: GPT-5 / Claude / Qwen3 + RAG + a small SFT LoRA + DPO for tone, served via vLLM `--enable-lora` on a small GPU fleet. Klarna's deployment displaced ~700 FTEs and projected $40M of profit improvement; Harvey raised at $5B. The tooling — Unsloth, TRL, Distilabel, Inspect AI, Ollama — has stabilised. This course is the lifecycle most engineers will execute next quarter.

WHAT YOU'LL LEARN

01RAG vs Fine-tuning vs Hybrid

02Domain data — collection, curation, synthesis

03LoRA / QLoRA / DoRA — the production fine-tune

04Continued pre-training (CPT) — when worth the spend

05Preference tuning — DPO, KTO, GRPO

06Domain evals — golden sets, judge, gates

07Serving — vLLM with LoRA hot-swap

08The on-prem domain assistant

YOU'LL BE ABLE TO

Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrix and defend the choice in an ADR

Curate domain SFT data with Distilabel + Magpie + judge filter + MinHash dedupe

Run a production QLoRA SFT loop on a 7-8B base via Unsloth + TRL

Run DPO / GRPO preference tuning on top of an SFT'd base

Build a 4-layer scorecard harness (generic / domain / golden / hallucination) gating CI

Serve N LoRA adapters multi-tenant via vLLM `--enable-lora`

Ship a fully on-prem domain assistant with Ollama + Qdrant + drift watcher

SKILLS YOU'LL GAIN

Real skills, real career delta.

Skills you'll gain

Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrixProduction
Decide along knowledge-volatility x format-criticality x tone/safety x domain-vocabulary. Defend the call in an ADR with measured numbers — not vibes.
Curate domain SFT data with Distilabel + MagpieProduction
Synthetic instruction generation, judge-LLM filtering, MinHash dedupe, Argilla review. 100 seed prompts → 5K production-grade SFT pairs.
Run QLoRA SFT on a 7-8B base via UnslothProduction
Single-GPU, 4-bit NF4, rank 16 / alpha 32, completion-only loss, 3 epochs. Ship a 50MB adapter to HF Hub. The single highest-leverage 2026 skill.
Decide when continued pre-training pays backWorking
CPT only when the domain has its own vocabulary (legal Latin, ICD-10, rare protein motifs). Quote BloombergGPT's $2.7M cautionary tale; cite the math.
Apply DPO / KTO / ORPO for tone & refusal alignmentProduction
Collect chosen/rejected pairs from real user thumbs-up/down. Train DPO on top of an SFT'd base. A/B vs the SFT'd base — measure tone without losing capability.
Apply GRPO for verifiable-reward reasoning fine-tunesAdvanced
DeepSeek-R1-style RLVR on tasks with executable verification (SQL, math, code). Group size 8, KL beta 0.04. The 2025-2026 frontier reasoning technique.
Build a domain eval harness with LLM-as-judge + Inspect AIProduction
Custom 200-500 golden set, frontier judge model (Claude Opus 4.7 / GPT-5), Inspect AI scoring, HTML report. CI gate on -2pp regression.
Serve N LoRA adapters multi-tenant with vLLMProduction
`vllm serve <base> --enable-lora --max-loras N`. Per-request adapter routing. Locust load test. The 2026 multi-tenant deployment pattern.
Ship an on-prem domain assistantAdvanced
Ollama (merged model) + Qdrant RAG over your own docs + Streamlit/Next.js UI + Prometheus metrics. The deployment regulated industries actually buy.
Detect domain drift in productionWorking
Eval-on-traffic: sample 1% of prod requests, score with a judge LLM, alert on weekly regression. Triggers re-curation + re-tuning loops before users notice.

RUNNABLE ON YOUR MACHINE

$ docker pull snap/domain-llm:rag-vs-ft

$ docker run --rm -it snap/domain-llm:rag-vs-ft

snap/domain-llm:rag-vs-ft

QUICK PREVIEW · 7 MIN

VERIFIED ENGINEER REVIEWS

The decision tree (tune/RAG/prompt) saved a quarter.

@fintune_finnVERIFY ON GITHUB

Drift detection lesson is gold for regulated stacks.

@sre_mayaVERIFY ON GITHUB

LESSONS8

HOURS~1.4

LEARNERS1,290

THIS WEEK+27%

Tune for your domain.Bias toward youraccuracy.

Domain LLM

Real skills, real career delta.

Skills you'll gain

Tune for your domain.
Bias toward your
accuracy.