GENAICourse

Generative AI & foundation models

Lessons11modules
Total88mfull study
Quick7mtrailer
Projects10docker labs
CHEATSHEET · 01GenAI · master cheatsheet
Model selection
  • ·Smallest model that beats your eval bar wins
  • ·Frontier models for hard reasoning, mini models for routing
  • ·Reasoning models (o3, claude-opus-4-7-thinking) only when math/logic; they cost 3-8×
  • ·Latency budget often matters more than the leaderboard
Cost discipline
  • ·Always set max_tokens — it's the only hard cap
  • ·Log prompt + completion tokens on every call
  • ·Cache repeat prompts (system prompts, few-shot blocks)
  • ·Output tokens cost 2-3× input — write prompts that ask for terse answers
  • ·Route easy queries to mini, hard ones to frontier — 60-80% spend cut
Prompt engineering
  • ·Role + task + format + examples + constraints — the 5-block template
  • ·Temperature 0 for classification / extraction / agents
  • ·Temperature 0.7-1.0 only for creative output
  • ·Few-shot (3-5 examples) usually beats zero-shot by 20-30 accuracy points
  • ·Order examples easy → hard; the model anchors on the last one
Structured output
  • ·Use JSON-mode or response_format=json_schema — never regex parsing
  • ·Pydantic + Instructor for retry-on-validation-failure
  • ·Schema first — write the shape before the prompt
  • ·On failure: log the bad output, don't silently 500
Production patterns
  • ·Stream for >500-token outputs
  • ·Retry with backoff on 429/503/529 — exponential, jittered
  • ·Time-out at 30s — past that, something is wrong
  • ·Eval-gate every prompt change in CI (Promptfoo / DeepEval)
  • ·Trace every call — provider id, prompt hash, completion, tokens, $
  • ·Fallback to a smaller / local model on overload
Safety & guardrails
  • ·Treat user input as hostile — prompt injection is real
  • ·Llama Guard / NeMo Guardrails on input AND output
  • ·Strip PII before logs
  • ·Never let an LLM emit raw SQL — parameterize
When NOT to generate
  • ·Use regex / SQL / classifier when behavior is deterministic
  • ·Embeddings for stable-class classification — 100× cheaper
  • ·Classical ML for tabular + numeric
  • ·Safety-critical paths shouldn't be generative
CHEATSHEET · 02Decision flowcharts you'll memorize
Pick a model in 4 questions
  • ·1. Latency budget < 1s? → mini / local SLM
  • ·2. Output > 500 tokens? → stream + frontier mini
  • ·3. Math / logic / multi-step? → reasoning model (only if you can pay)
  • ·4. Tabular / classification? → don't use an LLM, train a classifier
Pick a pattern in 3 questions
  • ·1. Static knowledge? → fine-tune (rare; reach for last)
  • ·2. Fresh / private docs? → RAG
  • ·3. Same task, vary inputs? → prompt + few-shot
Diagnose a misbehaving LLM feature
  • ·Drift in eval bar? → prompt regressed; bisect last commit
  • ·Rising p99? → bigger prompt; trim system or cache prefix
  • ·Cost spike? → check completion_tokens; set max_tokens
  • ·Hallucinated facts? → no retrieval; add RAG or guardrail