GENAICourse

Generative AI & foundation models

Lessons11modules

Total88mfull study

Quick7mtrailer

Projects10docker labs

CHEATSHEET · 01GenAI · master cheatsheet

Model selection

·Smallest model that beats your eval bar wins
·Frontier models for hard reasoning, mini models for routing
·Reasoning models (o3, claude-opus-4-7-thinking) only when math/logic; they cost 3-8×
·Latency budget often matters more than the leaderboard

Cost discipline

·Always set max_tokens — it's the only hard cap
·Log prompt + completion tokens on every call
·Cache repeat prompts (system prompts, few-shot blocks)
·Output tokens cost 2-3× input — write prompts that ask for terse answers
·Route easy queries to mini, hard ones to frontier — 60-80% spend cut

Prompt engineering

·Role + task + format + examples + constraints — the 5-block template
·Temperature 0 for classification / extraction / agents
·Temperature 0.7-1.0 only for creative output
·Few-shot (3-5 examples) usually beats zero-shot by 20-30 accuracy points
·Order examples easy → hard; the model anchors on the last one

Structured output

·Use JSON-mode or response_format=json_schema — never regex parsing
·Pydantic + Instructor for retry-on-validation-failure
·Schema first — write the shape before the prompt
·On failure: log the bad output, don't silently 500

Production patterns

·Stream for >500-token outputs
·Retry with backoff on 429/503/529 — exponential, jittered
·Time-out at 30s — past that, something is wrong
·Eval-gate every prompt change in CI (Promptfoo / DeepEval)
·Trace every call — provider id, prompt hash, completion, tokens, $
·Fallback to a smaller / local model on overload

Safety & guardrails

·Treat user input as hostile — prompt injection is real
·Llama Guard / NeMo Guardrails on input AND output
·Strip PII before logs
·Never let an LLM emit raw SQL — parameterize

When NOT to generate

·Use regex / SQL / classifier when behavior is deterministic
·Embeddings for stable-class classification — 100× cheaper
·Classical ML for tabular + numeric
·Safety-critical paths shouldn't be generative

CHEATSHEET · 02Decision flowcharts you'll memorize

Pick a model in 4 questions

·1. Latency budget < 1s? → mini / local SLM
·2. Output > 500 tokens? → stream + frontier mini
·3. Math / logic / multi-step? → reasoning model (only if you can pay)
·4. Tabular / classification? → don't use an LLM, train a classifier

Pick a pattern in 3 questions

·1. Static knowledge? → fine-tune (rare; reach for last)
·2. Fresh / private docs? → RAG
·3. Same task, vary inputs? → prompt + few-shot

Diagnose a misbehaving LLM feature

·Drift in eval bar? → prompt regressed; bisect last commit
·Rising p99? → bigger prompt; trim system or cache prefix
·Cost spike? → check completion_tokens; set max_tokens
·Hallucinated facts? → no retrieval; add RAG or guardrail