CHEATSHEET · 01Multi-agent · master cheatsheet
When to split
- ·Tool count > ~12 with > 10% mis-selection rate
- ·Different cost/latency budgets per sub-task
- ·Different trust boundaries (internet vs internal data)
- ·Different model strengths needed (cheap classifier + expensive synth)
- ·Eval lift > added cost — measure first, split second
The 7 patterns
- ·Router — one classifier, one specialist per turn
- ·Supervisor / Manager-Worker — central orchestrator delegates + aggregates
- ·Hierarchical — supervisors of supervisors (50+ agents only)
- ·Pipeline — fixed DAG, predictable, rigid
- ·Hand-off / Swarm — one active agent, returns next agent
- ·Debate / Consensus — N propose, judge picks (or vote)
- ·Blackboard — Redis scratchpad, agents read/write
- ·Graph (LangGraph) — explicit nodes/edges + conditional cycles
Communication contracts
- ·Pydantic-typed messages between every pair of agents
- ·Per-handoff summary; archive raw transcripts
- ·MCP for tools (the 2026 standard)
- ·A2A for cross-vendor agent calls (signed Agent Cards)
- ·No free text between agents — that's a smell
Production guardrails
- ·Cap recursion depth (typically 5-8)
- ·Per-agent token budget + global trace budget
- ·Sandbox every tool call (Agents SDK / harness / Docker)
- ·Per-agent auth, scoped tool tokens
- ·Persist checkpoints — replay is your debugger
- ·Allow-list handoffs, signed Agent Cards (A2A 0.3)
- ·Detect cross-agent prompt injection at every boundary
CHEATSHEET · 02Framework picks · 2026
Default for production
- ·LangGraph 1.0 (Oct 2025 GA, current 1.0.x). Use for: stateful supervisor teams, durable execution, audit-grade traces.
- ·OpenAI Agents SDK. Use for: TS/JS shops, fastest path from Swarm patterns, hand-offs + sandboxing built in.
- ·Anthropic Claude Agent SDK. Use for: long-horizon harnesses, sub-agent spawning, Claude-only stacks.
Default for fast crews
- ·CrewAI Flows. Use for: role-based teams, biz-process automation, fastest time-to-first-agent (~30 min).
- ·DocuSign report: ~14× less code than LangGraph for equivalent flows (vendor-published, take with salt).
- ·Watch out: ~18% token overhead per published benchmarks.
Default for memory-first or typed
- ·Letta — OS-style tiered memory (core/recall/archival), persistent agents.
- ·Pydantic AI — type-safe end-to-end, Logfire built in, durable execution.
- ·smolagents — code agents (HF), tiny core, great for research-y workflows.
Self-hosted backbone
- ·gpt-oss-120b / gpt-oss-20b (Apache 2.0, Aug 2025). 20b runs on 16GB edge.
- ·Mistral La Plateforme Agents API — EU sovereignty + native MCP.
- ·Ollama in Docker — the standard local workstation runner.
Avoid / migrate
- ·OpenAI Swarm — deprecated March 2025; migrate to Agents SDK.
- ·AutoGen 0.4 — Microsoft moved it to maintenance. New: MS Agent Framework (RC 1.0). Original creators: AG2 fork.
- ·Magentic-One direct — now exposed as an orchestration primitive in MS Agent Framework.
Observability stack
- ·Langfuse (acquired by ClickHouse, Jan 2026) — multi-agent conversation views, OSS, self-host friendly.
- ·LangSmith — best with LangGraph.
- ·Arize Phoenix — OpenInference / OpenTelemetry-native, deep eval.
- ·W&B Weave — preserves parent-child agent traces.
- ·Helicone — gateway-level cost attribution.