Domain LLM — Quick Intro

INTROBLOCK · 01

DLM · 7 MIN PREVIEW

Domain LLMs are how vertical AI gets paid in 2026.

Klarna's AI handled 2.3M conversations in its first month, displacing ~700 FTEs and projecting $40M of profit improvement. Harvey raised at a $5B legal-AI valuation. Hippocratic AI shipped 10K healthcare tasks with regulator-aware safety. None of them are 'just GPT'. They're the full domain-LLM lifecycle: data curation, SFT, preference tuning, evals, serving. This trailer shows the pieces.

CONCEPTBLOCK · 02

What 'domain LLM' actually means in 2026

A **domain LLM** is a language model adapted to a vertical — legal, medical, financial, code, customer support — where 'good enough' isn't. It's not a base model with a system prompt. It's a model whose weights, prompts, retrieval, and evals have all been bent toward the domain. The production lifecycle, in order: — **Data**. Domain corpora (case law, EHR notes, 10-Ks, code repos), instruction pairs, preference pairs. Synthetic when you can't get human data fast enough. — **CPT** (continued pre-training). Only when the domain has its own *language* (legal Latin, ICD-10, rare protein motifs). Otherwise skip — BloombergGPT's $2.7M is the cautionary case study. — **SFT** (supervised fine-tuning). LoRA / QLoRA on 5-10K curated pairs. Cheap, fast, ships next sprint. — **Preference tuning**. DPO for tone, KTO for unpaired thumbs, GRPO for verifiable rewards (math/code/SQL). Closes the production feedback loop. — **Evals**. Domain golden set + LLM-as-judge + standard benches (HealthBench, LegalBench, FinanceBench, SWE-Bench Verified). Gate on regression. — **Serving**. vLLM with `--enable-lora` for multi-adapter, multi-tenant — N domains, one base. — **On-prem**. Ollama + merged model + Qdrant RAG. The deployment regulated industries actually buy.

TIPMost teams should start with prompt + RAG and only fine-tune when 50 prompt iterations can't fix the format/voice. The 2026 default for vertical AI is GPT-5 / Claude + RAG + a small LoRA on top — not a from-scratch domain model.

WATCH OUTFine-tuning teaches STYLE, not FACTS. New knowledge wants RAG. Style/format wants SFT. Tone/refusal wants DPO. CPT only when the domain has new vocabulary.

GOTCHABloombergGPT (50B params, 700B tokens, ~$2.7M compute) was published in March 2023. By 2024 industry consensus had hardened: GPT-4-class + RAG out-performs it on the same finance tasks at a fraction of the cost. Don't recreate that mistake.

DIAGRAMBLOCK · 03

Domain LLM lifecycle — corpus to served adapter

Solid arrows are the spine; dashed are optional/feedback. RAG sits orthogonally — most production stacks have BOTH RAG and a light SFT.

CODEBLOCK · 04

10 lines: a QLoRA fine-tune you can run on one GPU

PYTHON

1from unsloth import FastLanguageModel

2from datasets import load_dataset

3from trl import SFTTrainer, SFTConfig

5model, tok = FastLanguageModel.from_pretrained(

6 "unsloth/Qwen2.5-7B-Instruct", load_in_4bit=True, max_seq_length=4096)

7model = FastLanguageModel.get_peft_model(

8 model, r=16, lora_alpha=32,

9 target_modules=["q_proj", "k_proj", "v_proj", "o_proj",

10 "gate_proj", "up_proj", "down_proj"])

12ds = load_dataset("your-org/legal-clauses-5k", split="train")

13trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=ds,

14 args=SFTConfig(output_dir="out", num_train_epochs=3,

15 per_device_train_batch_size=2,

16 learning_rate=2e-4, logging_steps=10))

17trainer.train()

18model.save_pretrained("out/legal-lora")

Lines 5-6: 4-bit base on a 24GB GPU. Lines 7-10: rank 16, alpha 32 — the sane SFT defaults. Line 17: ship the adapter (~50MB), not the full model. This is the ENTIRE production SFT pipeline most teams actually run.

CHEATSHEETBLOCK · 05

5 rules every 2026 domain-LLM shipper knows

01Start prompt → add RAG → SFT only when format/voice can't be prompted in 50 tries.

02LoRA rank 16, alpha 32, all attention + MLP target modules. That's the default — don't tune until you're profiling.

03Train on completion only — masking the prompt portion is the silent bug that ruins half of beginner SFT runs.

04Eval on a held-out golden set BEFORE shipping. 2pp regression on any metric blocks the merge.

05Serve N adapters on one base via vLLM `--enable-lora`. Don't run N copies of the model.

MINIGAME · RAPIDFIRETFBLOCK · 06

Quick check — true or false?

Fine-tuning is the right way to teach a model new facts that change weekly.

CLAIM 1/5 · READY · scroll into view

CONCEPTBLOCK · 07

What you'll ship in the full study

Eight lessons. Eight docker projects. By the end you'll have: — A RAG-vs-FT bench harness that produces an ADR-grade decision pack on any task. — A Distilabel + Magpie synthetic-data pipeline that turns 100 seed prompts into 5K curated SFT pairs. — A working Unsloth + QLoRA SFT loop that ships a 50MB adapter to HF Hub. — A small CPT smoke demo that shows you what continued pre-training feels like — without burning $2.7M. — A DPO feedback loop wired into a Gradio thumbs-up/down UI — preference data → adapter in one repo. — A domain eval harness with Inspect AI + LLM-as-judge + an HTML report. — A vLLM serving setup with N LoRA adapters and per-request routing. — A fully on-prem domain assistant: Ollama + merged model + Qdrant RAG over your own docs. Every docker project ships with composeYaml, expectedStdout, and a 'lift to work' note explaining how to drop it into your team's repo.

INCLUDEDAll projects target a 7-8B base (Qwen2.5-7B or Llama 3.1 8B) so they actually run on a single A100 / RunPod / Modal box. No frontier-only fantasies.

LESSON COMPLETEBLOCK · 08

That's the trailer.

NEXTLesson 1 · The RAG vs FT vs Hybrid decision

WHAT YOU'LL WALK AWAY WITH

Real skills, real career delta.

Skills you'll gain

Pick RAG vs SFT vs DPO vs CPT from a 4-axis matrixProduction
Decide along knowledge-volatility x format-criticality x tone/safety x domain-vocabulary. Defend the call in an ADR with measured numbers — not vibes.
Curate domain SFT data with Distilabel + MagpieProduction
Synthetic instruction generation, judge-LLM filtering, MinHash dedupe, Argilla review. 100 seed prompts → 5K production-grade SFT pairs.
Run QLoRA SFT on a 7-8B base via UnslothProduction
Single-GPU, 4-bit NF4, rank 16 / alpha 32, completion-only loss, 3 epochs. Ship a 50MB adapter to HF Hub. The single highest-leverage 2026 skill.
Decide when continued pre-training pays backWorking
CPT only when the domain has its own vocabulary (legal Latin, ICD-10, rare protein motifs). Quote BloombergGPT's $2.7M cautionary tale; cite the math.
Apply DPO / KTO / ORPO for tone & refusal alignmentProduction
Collect chosen/rejected pairs from real user thumbs-up/down. Train DPO on top of an SFT'd base. A/B vs the SFT'd base — measure tone without losing capability.
Apply GRPO for verifiable-reward reasoning fine-tunesAdvanced
DeepSeek-R1-style RLVR on tasks with executable verification (SQL, math, code). Group size 8, KL beta 0.04. The 2025-2026 frontier reasoning technique.
Build a domain eval harness with LLM-as-judge + Inspect AIProduction
Custom 200-500 golden set, frontier judge model (Claude Opus 4.7 / GPT-5), Inspect AI scoring, HTML report. CI gate on -2pp regression.
Serve N LoRA adapters multi-tenant with vLLMProduction
`vllm serve <base> --enable-lora --max-loras N`. Per-request adapter routing. Locust load test. The 2026 multi-tenant deployment pattern.
Ship an on-prem domain assistantAdvanced
Ollama (merged model) + Qdrant RAG over your own docs + Streamlit/Next.js UI + Prometheus metrics. The deployment regulated industries actually buy.
Detect domain drift in productionWorking
Eval-on-traffic: sample 1% of prod requests, score with a judge LLM, alert on weekly regression. Triggers re-curation + re-tuning loops before users notice.

Career & income delta

Career moves

Title yourself credibly as 'vertical AI engineer' or 'fine-tuning specialist' — one of the highest-paid 2026 IC titles in vertical SaaS.
Lead the AI-platform LoRA registry at your company — the platform-engineering line item nobody else is staffed for.
Pick up contracting work at $300-500/hr fixing teams whose 'we'll just fine-tune GPT' plan went sideways.
Become the 'domain LLM' SME at a vertical SaaS company — legal, medical, finance, support — where one shipped feature pays for the role.
Move from a generic backend role into a vertical-AI team — domain SFT + DPO experience is the differentiator.

Income impact

$30-70K bump for senior ICs adding 'production fine-tuning' + DPO to their resume in 2026.
$50-200K bump moving into a vertical-AI team at a regulated-industry SaaS (legal-tech, health-tech, fin-tech).
Freelance / consulting rates: $300-500/hr — 'we tried fine-tuning and it got worse' is the most common 2026 inquiry.
Enterprise demos / sales-engineering: closing one 7-figure deal per year often hinges on a working multi-LoRA on-prem demo.
Klarna's $40M-projected-saving narrative is now table-stakes for vertical-AI sales — engineers who can replicate the pattern command premiums.

Market resilience

Domain LLM skills survive every base-model swap — the lifecycle (data, SFT, DPO, eval, serve) is the durable craft.
On-prem skills (Ollama + LoRA-merged models + Qdrant) remain in demand for any regulated industry, no matter the cloud market.
Eval discipline (golden sets, LLM-as-judge, regression gates) is the moat most teams will struggle to build.
GRPO / RLVR on verifiable rewards is the technique behind 2025-2026 reasoning models — owning it pays for the next 2 years.
Multi-tenant LoRA serving (vLLM `--enable-lora`) becomes the platform-engineering skill SaaS companies must hire for.