- The VLM mental model8m · 12 blocks
- Picking your multimodal model10m · 12 blocks
- OCR-free document AI10m · 12 blocks
- Multimodal RAG with ColPali12m · 12 blocks
- Voice agents — the sub-300ms loop10m · 12 blocks
- Long-video understanding10m · 12 blocks
- Cross-modal embeddings & search10m · 12 blocks
- Eval, hallucination & air-gapped local12m · 12 blocks