AI-native software development — Quick Intro

INTROBLOCK · 01

AND · 7 MIN PREVIEW

Coding agents stopped being a demo. They ship your PRs now.

SWE-bench Verified top scores crossed 87.6% in April 2026 (Claude Opus 4.7 Adaptive). Stripe is merging 1,000+ AI PRs per week. Shopify reports a measured 20% productivity lift across engineering. The teams pulling away aren't the ones using more agents — they're the ones with the right loop, the right rails, and a $-per-merged-PR dashboard. This trailer shows the gap.

CONCEPTBLOCK · 02

The one-line difference

Autocomplete completes a token. An agent runs a loop: read → plan → edit → verify, and won't stop until the tests are green. The 'AI-native developer' isn't someone who types prompts faster — it's someone who has built rails (specs, hooks, MCP servers, eval gates) so the loop converges without hand-holding. If your only agent input is a Slack-style chat, you don't have an AI-native workflow. You have a chatbot pretending to be a teammate.

TIPAlways test the loop end-to-end on a real ticket before optimising any single stage. Most teams over-engineer the prompt and under-engineer the verify step.

WATCH OUTAuto-merge without a verify gate is the #1 cause of regression cascades. SWE-bench-Verified accuracy is *task* accuracy — your repo accuracy is whatever your verify step says it is.

GOTCHAAn agent with bash + write access and no recursion cap will rm -rf or burn $50 in a single run. The first rail you write is permission scope, not the prompt.

DIAGRAMBLOCK · 03

The agentic dev loop

Read is the cheapest stage and where most teams under-invest. Verify is the expensive one and where the trust comes from. Loop until verify passes, then merge — never merge to chase the loop.

CODEBLOCK · 04

A 12-line agentic loop you can run today

BASH

1# Pick any agent CLI: claude, cursor-agent, aider, codex, cline.

2# This snippet works with claude-code; substitute as needed.

4set -euo pipefail

5TICKET="$1" # e.g. LIN-1287

7# 1. READ — let the agent gather context, no edits allowed yet.

8claude --permission-mode plan -p "Read the repo and summarise files relevant to $TICKET" > plan.md

10# 2. PLAN — produce a checklist before any edits.

11claude --permission-mode plan -p "Given plan.md, write a 5-step checklist to close $TICKET" > checklist.md

13# 3. EDIT — implement strictly against the checklist.

14claude --permission-mode acceptEdits -p "Implement checklist.md. Stop after the last step."

16# 4. VERIFY — run the project's test suite and lint.

17npm test && npm run lint

19# 5. If verify failed, re-enter the loop with the failure as the new ticket.

Line 8 / 11: read and plan use plan permission mode — no edits, no shell. Line 14: only EDIT uses acceptEdits. Line 17: verify is your own test suite. The loop is your contract — not the agent's promise.

CHEATSHEETBLOCK · 05

The 5 rules every 2026 AI-native shipper knows

01Read → plan → edit → verify. Skip a stage and you'll pay for it in regressions.

02Project rules live in AGENTS.md / CLAUDE.md. The prompt is the LAST place to put them.

03Tests are the agent's contract. No tests → no agent loop. (TDD gets WAY easier with agents.)

04Hooks > prompts for anything you want enforced every time (lint, type, secrets).

05Track $-per-merged-PR weekly. Below $5 is healthy; above $20 is an architectural problem.

MINIGAME · RAPIDFIRETFBLOCK · 06

Quick check — true or false?

Top SWE-bench Verified scores in April 2026 are above 85%.

CLAIM 1/5 · READY · scroll into view

CONCEPTBLOCK · 07

What you'll ship in the full study

Ten lessons. Six docker projects. By the end you'll have: — An `agent-dev-shell` repo with claude-code, cursor-agent, aider, codex, and cline pre-wired against the same fix-the-bug task — so YOU pick the right agent for the job by measuring. — A `tdd-agent-loop` container that ships failing-test → green-test cycles end-to-end, ready to drop into your team's monorepo. — A custom MCP server (`mcp-postgres-pair`) that hands your agent typed read-only Postgres access without leaking the connection string. — A `pr-review-bot` GitHub Actions pipeline: PR opens → agent reviews → agent fixes → tests pass → human approves. — An `agent-cost-dashboard` (Helicone + Slack) that surfaces $-per-merged-PR by team — the metric that ends 'should we use Cursor or Claude Code' debates. Every docker project is meant to be lifted into your real work — not a demo.

INCLUDEDEach project ships with composeYaml, expectedStdout, and a 'lift to work' note explaining how to drop it into your team's repo on day one.

LESSON COMPLETEBLOCK · 08

That's the trailer.

NEXTLesson 1 · The agentic dev loop

WHAT YOU'LL WALK AWAY WITH

Real skills, real career delta.

Skills you'll gain

Diagnose when to leave autocomplete for an agent loopWorking
Use the 3-signal test (ticket length, cross-file edits, test surface) to pick autocomplete vs agent before opening the IDE — and quote the cost difference.
Author project rules (CLAUDE.md / AGENTS.md / cursor.rules)Production
Write rules that survive 8-hour sessions: build commands, conventions, DO/DO-NOT lists, gotchas, hook config — distilled from your bug history, not copied from a template.
Run plan-then-edit reliably across agent CLIsWorking
Use plan mode in Claude Code, plan-as-edit in Cursor, plan-files in Aider, PRD generation in Devin — recognise when to skip plan and when it's mandatory.
Drive a TDD-with-agent loop that convergesProduction
Red-green-refactor in agent loop: failing test as input, agent edits until green, agent stops at green. Includes the 'one bug = one regression test' commit rule.
Dispatch parallel sub-agents for read-only researchProduction
Spawn N independent Explore agents on non-overlapping queries; aggregate results back to a primary agent. Cost math: 4× tokens parallel often beats 4× sequential time at the team level.
Wire MCP servers (Linear, Sentry, Postgres, custom) into agentsWorking
Install standard MCP servers, write a custom one in <30 lines with FastMCP, scope read-only vs read-write, pair it with a permission boundary.
Configure agent hooks for lint / type / secrets enforcementProduction
Pre-tool-use, post-edit, post-stop hooks that catch unwanted edits BEFORE they ship. Hooks beat prompts because they don't depend on the model remembering.
Ship PR-review automation with auto-fix loopsProduction
CodeRabbit / Greptile / self-hosted reviewer + agent that consumes the review, fixes, re-runs tests. The 'no human-in-loop until tests pass' pattern.
Track $-per-merged-PR and route work by itProduction
Helicone / OpenAI dashboard / Anthropic Console + Slack digest of weekly $/PR by team. Use it to settle 'which agent / which model / which mode' debates with data.
Operate AI-native dev under SOC 2 / GDPR / regulated constraintsAdvanced
Permission models, audit logs, secrets scanning preflight, sandboxed exec, allow-listed shell — the rollout shape regulated industries actually buy.

Career & income delta

Career moves

Title yourself credibly as 'AI-native engineer' or 'developer-experience engineer' — the 2026 hiring channel for senior IC roles at $200-380K base.
Lead the 'developer productivity / DX' team that's getting stood up at every series-B+ company — the AI-tooling org that didn't exist 18 months ago.
Pick up contracting / fractional CTO work at $200-400/hr — 'we have Cursor seats but the team isn't shipping faster' is the most common 2026 inquiry.
Own the 'we ship 5× faster than competitors' narrative on your perf review — backed by a $-per-merged-PR dashboard you actually built.

Income impact

$20-40K bump for senior ICs adding measurable agent-driven shipping velocity to their resume.
$50-100K bump moving from a generic backend role to a developer-experience or AI-platform team.
Freelance / consulting rates: $200-400/hr — the most common 2026 inquiry is 'help us roll out coding agents without breaking prod'.
Enterprise demos / sales-engineering: closing one 6-figure DX-tooling deal per quarter often requires the cost-dashboard + rollout ADR shape in this course.

Market resilience

Coding agents are now table stakes — competence here is the new 'git'. Durable across foundation-model market shifts.
MCP and A2A are now Linux Foundation standards; protocol fluency carries over every time the model du jour changes.
TDD-with-agents discipline transfers to whatever framework launches next — the loop shape is the durable part.
Cost / quality observability beats vibes-based purchasing decisions — and that skill is provider-agnostic by design.
Permission-model + audit-trail expertise (SOC 2 / GDPR / regulated industries) stays in demand regardless of which agent vendor wins the year.