GENAICourse

Generative AI & foundation models

Lessons11modules
Total88mfull study
Quick7mtrailer
Projects10docker labs

Skills you'll gain

12
  • Model selection & routingProduction

    Pick frontier vs mini vs reasoning vs local models against latency, cost, and quality budgets — and route requests between them in one service.

  • Cost-bounded LLM featuresProduction

    Ship features with hard token budgets, max_tokens caps, and per-request $ tracking — defendable on a finance review.

  • Prompt engineeringWorking

    Author and maintain prompts that survive 3+ revisions: zero-shot, few-shot, CoT, structured-output, role design, anti-drift patterns.

  • Structured outputProduction

    Build JSON-mode + Pydantic + Instructor services that validate on every turn and retry on schema failure.

  • Function calling & tool useWorking

    Wire single and parallel tool-use, design idempotent tool contracts, and decide when an agent is the wrong answer.

  • Eval-driven LLM developmentProduction

    Write Promptfoo / DeepEval suites and gate releases on regression — turn prompts into testable, versioned artifacts.

  • Streaming & latency engineeringWorking

    Implement chunked SSE, partial-JSON streaming, and cut perceived chat latency from 3s+ to under 500ms.

  • Caching (prompt + semantic)Production

    Design cacheable prefixes, set up prompt caching and Redis-backed semantic cache — verified 40-70% spend reduction.

  • LLM observabilityProduction

    Stand up LiteLLM + Prometheus + Grafana + Loki to trace every call with prompt hash, tokens, cost, and provider id.

  • Safety & prompt-injection defenceWorking

    Apply NeMo Guardrails / Llama Guard, run a red-team drill on your own service, and ship a hardened input/output filter chain.

  • Local-first deploymentWorking

    Run Ollama / vLLM with small models (Phi-4, Llama-3.2) and route to hosted APIs only on overflow — works offline, beats compliance reviews.

  • When NOT to generateProduction

    Replace LLM calls with regex / SQL / classifiers / embeddings where deterministic — shown to cut spend 30%+ on real audits.