Physical AI · robotics, world models & VLA

INTROBLOCK · 01

PAI · 7 MIN PREVIEW

Robots got a foundation model.

Physical AI is the 2026 stack — world models, vision-language-action policies, and a deployable runtime. NVIDIA's GTC March 2026 keynote made it official; Hugging Face LeRobot 0.5 made it accessible. This trailer is the 7-minute version of why every backend engineer should care.

CONCEPTBLOCK · 02

The three layers of Physical AI

Stop thinking 'robot ML pipeline'. Start thinking three layers stacked on top of each other. **1. World models.** Generative video models that predict 'if the robot does X, this is what will happen'. NVIDIA Cosmos Predict 2.5 + Transfer 2.5 + Reason 2 are the references, downloaded 2M+ times by April 2026. Used to generate synthetic episodes and to roll out policies before they touch hardware. **2. Vision-language-action (VLA) policies.** Models that take pixels + a language instruction and output joint actions. Pi0.5 (Apache 2.0 + Gemma), OpenVLA-7B (Apache 2.0), NVIDIA GR00T N1.7 (3B, April 2026) — all open releases. The 'transformer moment' for robotics. **3. Embodied runtime.** ROS2 Jazzy (current LTS), Fast DDS / Cyclone DDS, ONNX/TensorRT for the policy server, Foxglove 2.0 for telemetry, Nav2 for nav, Isaac ROS GEMs for vision. The deployment shape you actually ship. The person who knows where Pi0.5 meets a Franka FR3 is the 2026 hire.

DIAGRAMBLOCK · 03

World model · VLA policy · Runtime

Cosmos rolls out synthetic video. LeRobot v3 stores real episodes. The policy fuses both. ROS2 + DDS deploys to hardware. Foxglove telemetry feeds the next dataset.

CODEBLOCK · 04

Load a Pi0.5 policy and predict an action chunk

PYTHON

1# pip install lerobot==0.5.* huggingface_hub

2from lerobot.policies.pi0 import Pi0Policy

3import numpy as np, torch

4from PIL import Image

6policy = Pi0Policy.from_pretrained("lerobot/pi05_droid") # Apache 2.0 + Gemma

7policy.train(False) # inference mode (NOT 'training')

8policy.to("cuda")

10obs = {

11 "observation.images.top": torch.from_numpy(np.array(Image.open("top.jpg")))

12 .permute(2, 0, 1).unsqueeze(0).cuda() / 255.,

13 "observation.state": torch.zeros(1, 7).cuda(), # Franka FR3, 7-DoF

14 "task": ["pick up the red block and place it in the bowl"],

15}

16with torch.no_grad():

17 chunk = policy.select_action(obs) # [1, horizon=50, 7]

18print(chunk[0, 0].cpu().numpy()) # joint targets, t=0

Line 6: `from_pretrained` is the same Hugging Face shape as transformers. Line 7-8: inference mode + CUDA. Line 13: language conditioning — free-text instruction. Line 16: action chunks (horizon=50) are the 2026 default — predict a short trajectory, not a single step.

CHEATSHEETBLOCK · 05

The 5 rules every 2026 Physical AI shipper knows

01Sim first. Real second. Always a sysID + DR layer between them.

02Open weights are real now — Pi0.5, OpenVLA, GR00T N1.7. Default to them; commercial APIs are the fallback.

03Use LeRobotDataset v3. Don't invent your own format — your future self will hate you.

04ROS2 Jazzy LTS today, Lyrical Luth (May 2026) is next. ROS1 is dead. Fast DDS or Cyclone DDS — never both.

05Wrap every policy in a safety guardian — rate limit, joint clamp, e-stop. Even in sim.

MINIGAME · RAPIDFIRETFBLOCK · 06

Quick check — true or false?

Physical AI just means 'put ChatGPT on a robot'.

CLAIM 1/5 · READY · scroll into view

CONCEPTBLOCK · 07

What you'll ship in the full study

Eight lessons. Eight docker projects. By the end you'll have: — An Isaac Lab 2.3 training pipeline for a contact-rich pick-and-place task with PPO + domain randomization, exported to ONNX. — A Pi0.5 / OpenVLA-OFT / GR00T N1.7 policy server (FastAPI + TensorRT) you can plug into a real arm. — A LeRobotDataset v3 teleop rig that turns gamepad/VR input into training-ready data. — A sim-to-real evaluator that runs the same policy through Isaac Lab-Arena, LIBERO, and SIMPLER. — A ROS2 Jazzy runtime with Foxglove 2.0 telemetry and the policy as an action server. — A safety-guardian wrapper (rate-limit + joint-clamp + e-stop) that's ISO 10218 / 13482 audit-ready. — A NVIDIA Cosmos Predict 2.5 rollout sandbox that generates synthetic episodes and folds them into training. — An end-to-end on-prem stack you could run inside a regulated industry without external APIs. Every docker project is meant to be lifted into your real work — not a demo.

LESSON COMPLETEBLOCK · 08

That's the trailer.

NEXTLesson 1 · The three layers of Physical AI

WHAT YOU'LL WALK AWAY WITH

Real skills, real career delta.

Skills you'll gain

Read VLA / world-model papers without panicWorking
Decode Pi0.5, OpenVLA, GR00T N1.7, Cosmos Predict/Transfer/Reason, RT-2, Diffusion Policy, RDT — what the architecture is, what dataset, what eval suite, where the gap is.
Build & playback a LeRobotDataset v3Working
Record episodes with `lerobot-record`, push to Hugging Face Hub, replay with `lerobot-replay`, segment by language. The 2026 data lingua franca.
Train a contact-rich manipulation policy in Isaac Lab 2.3Production
PPO + domain randomization for a Lift / Open-Drawer / Stack task; vectorized sim on a single GPU, Hydra config, Wandb logging, ONNX export, Lab-Arena gating.
Deploy a VLA policy server (Pi0.5 / OpenVLA-OFT / GR00T N1.7)Production
FastAPI + ONNX/TensorRT in Docker; sub-50ms p99 latency on RTX 4090 / Jetson Thor; horizon=50 action chunks; Pydantic observation contract.
Wire a learned policy into ROS2 JazzyProduction
rclpy node, action server, JointState publisher, Foxglove 2.0 panel layout, Cyclone DDS QoS profile that doesn't drop frames at 50Hz.
Run the standard eval suites (LIBERO, SIMPLER, RoboCasa, Lab-Arena)Working
Author the eval config, run on sim, parse the success-rate JSON, gate CI on regressions, report MMRV + Pearson against real rollouts.
Domain-randomize + sysID + DAgger closure for sim2realProduction
DR ranges in Isaac Lab event-manager API; MuJoCo MJX system identification (mass/friction/PD gains); DAgger with real teleop expert; the standard 5-step handoff.
Wrap any policy in a safety guardianWorking
Rate limiter on joint targets, joint-position/velocity clamps, force/torque thresholds, soft e-stop watchdog, ISO 13482-aligned event log; works in sim and real.
Generate synthetic episodes with NVIDIA CosmosAdvanced
Cosmos Predict 2.5 rollouts; filter with Cosmos Reason 2; merge synth + real in LeRobot v3; quantify sim-only vs hybrid eval lift on RoboCasa or LIBERO.
Run a fully on-device runtime on Jetson Thor / OrinAdvanced
TensorRT NVFP4/FP8 quantization for the VLA policy, CUDA-aware DDS, Foxglove over Wi-Fi telemetry, thermal/power budgeting — the deployment shape humanoid startups buy.

Career & income delta

Career moves

Title yourself credibly as 'Robot Foundation Model Engineer' or 'Embodied AI Engineer' — the 2026 hiring channel for senior IC roles at Figure, 1X, Skild AI, Physical Intelligence, Apptronik, Agility, NVIDIA GEAR/Isaac, Cobot.
Lead a Physical AI platform team — the embedded-team-of-2026 inside any company shipping warehouse, last-mile, or domestic robots (Amazon, GXO, BMW, Mercedes have all hired publicly).
Pick up contracting work at $300-500/hr fixing teams whose 2023 ROS1 + behavior-cloning stack doesn't transfer to 2026 hardware.
Ship the 'we have a foundation policy on our robot' line item your CTO has been promising the board — and own the eval dashboard (Lab-Arena + LIBERO + SIMPLER) that proves it.

Income impact

$50-150K bump for senior backend / ML engineers adding production VLA + ROS2 to their resume in 2026.
$260-650K total comp for senior IC at humanoid startups (Figure, 1X, Apptronik, Agility) per April 2026 levels.fyi data; staff IC at top labs $650K+.
Freelance / consulting rates: $300-500/hr — sim-to-real tuning + safety-guardian work is the most common 2026 inquiry; humanoid startups pay top of band.
Enterprise demos / sales-engineering at NVIDIA Isaac partners: closing one 7-figure deal often hinges on the sim-to-real evaluator and the safety guardian — exactly what this course ships.
EU bands: typically 50-65% of US — Berlin / Munich / Zurich / London concentrate hiring (DeepMind London, Pi London, NVIDIA Munich, ETH Zurich spinouts). Senior €120-220K total.

Market resilience

Physical AI is the skill that survives the next foundation-model consolidation — orgs always need someone who knows how to wire a model into a robot safely.
ROS2 + DDS is platform-independent and standardized — your fluency carries across every robot vendor.
Open-weights policies (Pi0.5, OpenVLA, GR00T N1.7) are durable across model providers; closed APIs are the fallback, not the default.
Sim-to-real expertise outlasts any specific simulator — Isaac Lab, MuJoCo, Genesis all need the same DR + sysID + DAgger discipline.
Safety / ISO compliance skills (ISO 13482, 10218, TS 15066, 22166) are mandatory for any commercial deploy and don't expire.