Quick Intro~7 MIN· MULT

Multi-agent systems

Full Study

A scannable trailer of the 8-lesson course. Read top to bottom — no clicks needed.

INTROBLOCK · 01
MULT · 7 MIN PREVIEW

When one agent isn't enough.

The MAST study (NeurIPS 2025) measured a 17.2× error amplification when teams just throw N LLMs at a problem. Multi-agent done right cuts cost and improves accuracy. Multi-agent done wrong is the most expensive way to build a chatbot. This trailer shows the difference.

CONCEPTBLOCK · 02

The one-line difference

A single agent is one model in a tool-calling loop. A multi-agent system is N specialised agents plus a coordination contract. The contract — router, supervisor, pipeline, swarm, debate, blackboard, or graph — is what makes it a system instead of a chat group. If the only thing connecting your 'agents' is shared chat history, you don't have a multi-agent system. You have a chatbot pretending to be a team.
TIPAlways start single-agent. Promote to multi-agent only when measured eval lift exceeds the added cost — not because it's trendy.
WATCH OUTMAST taxonomy: unstructured 'bag of agents' amplifies errors 17.2× vs single-agent. Pick a pattern before writing code.
GOTCHAAn agent without a recursion cap, calling another agent without a recursion cap, will bankrupt you. The first guardrail you write is depth.
DIAGRAMBLOCK · 03

Router → 3 specialists

facts?code?ship?USERROUTERRESEARCHCODERREVIEWER
Router picks ONE specialist per turn — never broadcast. That's the cheapest, most predictable starting pattern.
CODEBLOCK · 04

A 12-line router with two specialists

PYTHON
1from openai import OpenAI
2client = OpenAI()
3
4def route(question: str) -> str:
5 r = client.chat.completions.create(
6 model="gpt-4o-mini", # cheap classifier
7 messages=[{"role": "system",
8 "content": "Reply 'code' or 'research' only."},
9 {"role": "user", "content": question}])
10 return "code" if "code" in r.choices[0].message.content.lower() else "research"
11
12def run(q, agents):
13 return agents[route(q)](q)
14
15agents = {"code": code_agent, "research": research_agent}
16print(run("fix the off-by-one in main.py", agents))
Line 4-6: routing uses a SMALL model. Line 12: dispatch is a 1-line lookup. The savings come from the cheap classifier — not from running more agents.
CHEATSHEETBLOCK · 05

The 5 rules every 2026 multi-agent shipper knows

01Start single-agent. Split only when the eval lift > the added cost.
02Pick a pattern (router, supervisor, pipeline, swarm, debate, blackboard, graph) BEFORE coding.
03Pydantic-typed messages between agents. Free text between agents is a smell.
04Cap recursion depth and per-agent token budget. Hard-fail past the cap.
05Eval the team end-to-end (MASEval / Braintrust), not each agent in isolation.
MINIGAME · RAPIDFIRETFBLOCK · 06

Quick check — true or false?

Multi-agent always beats single-agent on cost.
CLAIM 1/5 · READY · scroll into view
CONCEPTBLOCK · 07

What you'll ship in the full study

Eight lessons. Seven docker projects. By the end you'll have: — A LangGraph 1.0 supervisor team running locally with Langfuse traces. — An OpenAI Agents SDK hand-off triage system you can paste into a real support repo. — A Redis-backed blackboard pattern with structured Pydantic handoffs. — A pytest team-level eval harness that catches deadlock and role-drift in CI. — A fully air-gapped local stack on gpt-oss-20b for regulated industries. Every docker project is meant to be lifted into your real work — not a demo.
INCLUDEDEach project ships with composeYaml, expectedStdout, and a 'lift to work' note explaining how to drop it into your team's repo.
LESSON COMPLETEBLOCK · 08

That's the trailer.

NEXTLesson 1 · When to split into many agents
WHAT YOU'LL WALK AWAY WITH

Real skills, real career delta.

Skills you'll gain

10
  • Diagnose when single-agent is failingWorking

    Use the 3-signal test (tool count, cost/latency split, trust boundaries) plus measured eval lift to decide IF multi-agent is worth the cost — before writing any code.

  • Pick a coordination pattern from 7 canonical optionsProduction

    Router, supervisor, hierarchy, pipeline, hand-off/swarm, debate, blackboard, graph — recognise each, know the trade-offs, ship the right one.

  • Build a LangGraph 1.0 supervisor teamProduction

    Typed StateGraph, conditional edges, durable checkpoints, langgraph-supervisor library, Langfuse integration end-to-end.

  • Implement OpenAI Agents SDK hand-offsProduction

    Triage agent → specialist agents using the Agents SDK hand-off primitive with guardrails, sandboxes, and tracing on by default.

  • Ship a Redis-backed blackboard patternWorking

    Pydantic schemas + Redis scratchpad with race-condition tests; safe parallel fan-out + fan-in.

  • Wire MCP and A2A for cross-stack interopWorking

    MCP server for tool sharing, A2A signed Agent Cards for cross-framework agent calls — pair them like REST + JWT.

  • Write team-level evals (MASEval / Braintrust)Production

    Golden traces, regression suite gating CI, metrics: context-reuse rate, contradictory-output rate, decision-sync time, p95 latency.

  • Defend against multi-agent failure modesAdvanced

    Detect & prevent the 14 MAST modes: deadlock, infinite loops, role drift, prompt-injection cascade, recursion explosion.

  • Per-agent observability + cost attributionProduction

    Tag every LLM call with (trace_id, agent_id, parent_agent_id, tool); Langfuse / Phoenix / Weave dashboards over per-agent spend.

  • Run a fully air-gapped multi-agent stackAdvanced

    gpt-oss-20b via Ollama + smolagents + Letta + Redis + Prom/Grafana — the deployment regulated industries actually buy.

Career & income delta

Career moves
  • Title yourself credibly as 'AI agent engineer' or 'agent platform engineer' — the 2026 hiring channel for senior IC roles at $200-400K.
  • Lead a multi-agent platform team — the AI infra team most series-B/C companies are now staffing.
  • Pick up contracting work at $200-400/hr fixing teams whose 'bag of agents' is amplifying errors 17×.
  • Ship the multi-agent feature your CTO has been asking about for 6 months — and own that line item on your perf review.
Income impact
  • $20-50K bump for senior ICs adding production multi-agent to their resume in 2026.
  • $50-150K bump moving from a generic backend role to an agent-platform team.
  • Freelance / consulting rates: $200-400/hr — 'we have a multi-agent system we can't ship' is the most common 2026 inquiry.
  • Enterprise demos / sales-engineering: closing one 6-figure deal per quarter often requires the team-eval harness in this course.
Market resilience
  • Multi-agent is the skill that survives the next foundation-model consolidation — orgs always need someone who knows HOW to ship them safely.
  • MCP and A2A are now Linux Foundation standards; protocol fluency is durable across model providers.
  • Observability + eval discipline carries forward to whatever the 2027 framework du jour is.
  • Air-gapped/on-prem deployment skills (gpt-oss + Ollama) remain in demand for any regulated industry, no matter the model market.