Skills you'll gain
10- Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrixProduction
Decide along knowledge-volatility x format-criticality x tone/safety x domain-vocabulary. Defend the call in an ADR with measured numbers — not vibes.
- Curate domain SFT data with Distilabel + MagpieProduction
Synthetic instruction generation, judge-LLM filtering, MinHash dedupe, Argilla review. 100 seed prompts → 5K production-grade SFT pairs.
- Run QLoRA SFT on a 7-8B base via UnslothProduction
Single-GPU, 4-bit NF4, rank 16 / alpha 32, completion-only loss, 3 epochs. Ship a 50MB adapter to HF Hub. The single highest-leverage 2026 skill.
- Decide when continued pre-training pays backWorking
CPT only when the domain has its own vocabulary (legal Latin, ICD-10, rare protein motifs). Quote BloombergGPT's $2.7M cautionary tale; cite the math.
- Apply DPO / KTO / ORPO for tone & refusal alignmentProduction
Collect chosen/rejected pairs from real user thumbs-up/down. Train DPO on top of an SFT'd base. A/B vs the SFT'd base — measure tone without losing capability.
- Apply GRPO for verifiable-reward reasoning fine-tunesAdvanced
DeepSeek-R1-style RLVR on tasks with executable verification (SQL, math, code). Group size 8, KL beta 0.04. The 2025-2026 frontier reasoning technique.
- Build a domain eval harness with LLM-as-judge + Inspect AIProduction
Custom 200-500 golden set, frontier judge model (Claude Opus 4.7 / GPT-5), Inspect AI scoring, HTML report. CI gate on -2pp regression.
- Serve N LoRA adapters multi-tenant with vLLMProduction
`vllm serve <base> --enable-lora --max-loras N`. Per-request adapter routing. Locust load test. The 2026 multi-tenant deployment pattern.
- Ship an on-prem domain assistantAdvanced
Ollama (merged model) + Qdrant RAG over your own docs + Streamlit/Next.js UI + Prometheus metrics. The deployment regulated industries actually buy.
- Detect domain drift in productionWorking
Eval-on-traffic: sample 1% of prod requests, score with a judge LLM, alert on weekly regression. Triggers re-curation + re-tuning loops before users notice.