RAGMOD.RAG-10 · v1.0

Retrieve the right
chunk. Re-rank
to recall perfection.

10 micro-lessons · ~108 min · Real Docker images

THE INDEX · LIVE

IDX 16 · DIM 1536

INDEXED

QUERY: "how to chunk PDFs?"

top_k=3 · cosine12 ms

RAGAI ENGINEERINGHOT

RAG, vector DBs & enterprise search

Retrieval that grounds. Generation that ships. The whole 2026 pipeline.

WHY THIS MATTERS · DATABRICKS · STATE OF DATA + AI 2026

Vector-database adoption grew 377% YoY (Databricks 2026). 70% of enterprises using GenAI now augment base models with retrieval rather than relying on raw LLMs. RAG is the #1 enterprise AI use case — and most teams still ship it wrong (chunking errors account for ~80% of production-quality issues).

WHAT YOU'LL LEARN

01Embeddings — turning text into geometry

02Chunking — the silent quality lever

03Vector stores compared

04Hybrid search & query rewriting

05Re-ranking with cross-encoders

06Eval — Ragas, TruLens, Phoenix

07Advanced patterns — Agentic, GraphRAG, Multimodal, CAG

08Production observability + cost

09Embedding migration & versioning

10Production hardening — PII, prompt injection, multi-tenancy

YOU'LL BE ABLE TO

Ship a hybrid RAG (BM25 + vector + RRF + rerank) that beats your single-vector baseline by ≥ 15 Recall@K

Pick embedding model + chunker + vector store + reranker by hard numbers, not vibes

Gate every RAG change in CI with Ragas / DeepEval thresholds (Faithfulness ≥ 0.9, Recall@5 ≥ 0.85)

Run an air-gapped on-prem stack (Ollama + nomic-embed + pgvector) for regulated industries

Migrate embeddings safely with shadow re-index + dual-read + flag-driven cutover

SKILLS YOU'LL GAIN

Real skills, real career delta.

Skills you'll gain

Pick the right embedding model from MTEB + cost + privacyWorking
Choose between text-embedding-3-large (Matryoshka), voyage-3-large, Cohere embed-v4, BGE-M3, Nomic-embed-text-v2-MoE based on language, modality, deployment and budget. Defend the pick with MTEB scores.
Choose & justify a chunking strategy by data shapeProduction
Recursive vs semantic vs late-chunking vs Anthropic contextual retrieval vs parent-document. Code-aware and table-aware splitters for source code and structured docs.
Choose & justify a vector store from 8 production optionsProduction
pgvector, Qdrant, Weaviate, Milvus, Pinecone, LanceDB, Vespa, Mongo Atlas — pick by scale, ops model, hybrid support, payload filtering, multi-tenancy. Read VectorDBBench results.
Implement hybrid search with RRF on at least two storesProduction
BM25 + dense vector + RRF (k≈60). Score fusion vs rank fusion. Native hybrid in Qdrant/Weaviate/Pinecone vs hand-rolled with pgvector + tsvector.
Add query rewriting (HyDE, multi-query, decomposition)Working
Choose rewriting strategy from query shape — short/under-specified → HyDE; ambiguous → multi-query; complex compound → decomposition + step-back. Measure lift on Recall@K.
Re-rank with Cohere/Jina/BGE/ColPaliProduction
Cross-encoder rerank for surgical context. Open-source vs API. ColPali / ColBERT late-interaction for multi-vector retrieval. Trim 50 → 8 to cut token cost AND raise faithfulness.
Eval RAG with Ragas (faithfulness, answer relevance, context P/R)Production
Build a labelled Q→chunk → answer dataset. Run Ragas + TruLens RAG triad. Gate CI on Recall@5, Faithfulness, p95 latency, $/query.
Ship Agentic RAG (corrective + self-RAG)Advanced
LangGraph 1.0 state machine: retrieve → grade → rewrite/re-retrieve → generate → self-check → loop. Knows when to web-search, when to refuse, when to ask for clarification.
Index visually-rich PDFs with ColPali (no OCR)Advanced
Multimodal RAG over slides, scanned PDFs, screenshots using ColPali multi-vector embeddings — beats OCR + text retrieval on layout-heavy corpora.
Observe + budget RAG in Langfuse / PhoenixProduction
Trace every stage of a RAG call (embed → search → rerank → LLM). Per-stage latency + token attribution, $/query dashboards, cache-hit rate, negative-answer-rate.
Embedding-model migration with versioning + blue/green indexAdvanced
Tag (model_id, dim, chunker_version) on every row. Run shadow-index re-embed, dual-read with feature flag, A/B eval before cutover. Never re-embed in place.
Defend retrieval-time prompt injection + PIIAdvanced
Detect instructions hidden in retrieved chunks (OWASP LLM01). PII redaction at ingest, retrieval-time policy filters per tenant, signed retrieval audit trail.

RUNNABLE ON YOUR MACHINE

$ docker pull snap/rag-vector:hello

$ docker run --rm -it snap/rag-vector:hello

snap/rag-vector:hello

QUICK PREVIEW · 7 MIN

VERIFIED ENGINEER REVIEWS

The chunker-bench container finally answered 'what chunker for OUR docs' with numbers, not vibes.

@rag_raviVERIFY ON GITHUB

Re-ranking lesson: 4 minutes, +18 Recall@8 on our internal RAG. Highest ROI 5 lines I've ever shipped.

@kofi.infraVERIFY ON TWITTER

LESSONS10

HOURS~1.8

LEARNERS6,890

THIS WEEK+19%

Retrieve the rightchunk. Re-rankto recall perfection.

RAG, vector DBs & enterprise search

Real skills, real career delta.

Skills you'll gain

Retrieve the right
chunk. Re-rank
to recall perfection.