INTROBLOCK · 01
SRE · 7 MIN PREVIEW
AI for SREs / Platform
LLM observability. Cost / latency monitoring at the gateway. Incident playbooks. Rate limits that bite without breaking UX.
CONCEPTBLOCK · 02
LLMs need new SLOs, not new SLO frameworks
Your SLO toolkit (error budgets, percentile latency, availability targets) all still apply — you just have new dimensions to observe: token throughput, cost per request, prompt + completion latency split, retries per call, hallucination rate (sampled). The trick is wiring these into your existing OTel + Prometheus + Grafana stack so on-call doesn't need a new tab. The OpenLLMetry semantic conventions give you the attribute names; the gateway is the natural enforcement point for cost + rate limits.
TIPSet both a cost SLO ($/1k requests) and a latency SLO (p95 TTFT). They drift in opposite directions when something is wrong.
WATCH OUTDon't trust per-route token counts from app code. The gateway is the single source of truth — apps can lie or forget.
DIAGRAMBLOCK · 03
LLM gateway: observe, throttle, fail over
One gateway. All policy. Nothing the app needs to know.
CODEBLOCK · 04
OpenLLMetry attribute names you should standardise
YAML1# Standard attributes on every llm span
2llm.request.model: "gpt-4o-mini"
3llm.request.temperature: 0.0
4llm.usage.prompt_tokens: 1402
5llm.usage.completion_tokens: 318
6llm.usage.total_tokens: 1720
7llm.response.finish_reason: "stop"
8llm.cost_usd: 0.000387
9gen_ai.system: "openai"
10gen_ai.operation.name: "chat"
These map cleanly to Prometheus histograms (latency, tokens) and counters (cost). Use the gen_ai.* names for vendor-neutral dashboards.
CHEATSHEETBLOCK · 05
Five things to remember
01The gateway is the single source of truth for tokens + cost.
02Rate-limit by tenant + by model. Per-IP is meaningless for service traffic.
03Multi-provider failover is an SRE concern, not an app concern.
04Sample hallucinations with LLM-as-judge in production.
05An LLM incident drill belongs in your quarterly tabletop rotation.
MINIGAME · RAPIDFIRETFBLOCK · 06
True or false: 6 seconds each
OTel can capture LLM token counts as span attributes.
CLAIM 1/5 · READY · scroll into view
LESSON COMPLETEBLOCK · 07
LLM-platform mental model: locked.
NEXTLLM observability with OTel + Grafana
WHAT YOU'LL WALK AWAY WITH
Real skills, real career delta.
Skills you'll gain
07- Wire LLM tracing into existing OTelWorking
Outcome from completing the course: wire llm tracing into existing otel.
- Bound cost + latency at the gatewayWorking
Outcome from completing the course: bound cost + latency at the gateway.
- Run an LLM incident drillWorking
Outcome from completing the course: run an llm incident drill.
- LLM observabilityWorking
Covered in lesson sequence — drop-in ready.
- Cost / latency monitoringWorking
Covered in lesson sequence — drop-in ready.
- Incident playbooksWorking
Covered in lesson sequence — drop-in ready.
- Rate limit designWorking
Covered in lesson sequence — drop-in ready.
Career & income delta
Career moves
- Lead a AI for SREs / Platform initiative on your team — most orgs have it on the roadmap and few have shipped it.
- Consulting work at $150-300/hr — 'SRE shipped to production' is a sought-after specialty in 2026.
- Move from generic IC to platform/AI-platform team where AI for SREs / Platform expertise is the entry ticket.
Income impact
- $15-40K bump for senior ICs adding AI for SREs / Platform to their resume.
- Freelance / consulting demand for the same skill: $150-300/hr in 2026.
- Closing enterprise deals often hinges on demonstrating the production patterns from this course.
Market resilience
- AI for SREs / Platform is a durable skill across model and framework consolidations.
- Production guardrails (cost caps, observability, audit, evals) carry forward to whatever the 2027 stack is.
- Core patterns transfer to cloud, on-prem, and hybrid deployments.