RAGMOD.RAG-10 · v1.0

Retrieve the right
chunk. Re-rank
to recall perfection.

10 micro-lessons · ~108 min · Real Docker images

THE INDEX · LIVE
IDX 16 · DIM 1536
INDEXED
2
1
3
QUERY: "how to chunk PDFs?"
top_k=3 · cosine12 ms
RAGAI ENGINEERINGHOT

RAG, vector DBs & enterprise search

Retrieval that grounds. Generation that ships. The whole 2026 pipeline.

Vector-database adoption grew 377% YoY (Databricks 2026). 70% of enterprises using GenAI now augment base models with retrieval rather than relying on raw LLMs. RAG is the #1 enterprise AI use case — and most teams still ship it wrong (chunking errors account for ~80% of production-quality issues).
WHAT YOU'LL LEARN
01Embeddings — turning text into geometry
02Chunking — the silent quality lever
03Vector stores compared
04Hybrid search & query rewriting
05Re-ranking with cross-encoders
06Eval — Ragas, TruLens, Phoenix
07Advanced patterns — Agentic, GraphRAG, Multimodal, CAG
08Production observability + cost
09Embedding migration & versioning
10Production hardening — PII, prompt injection, multi-tenancy
YOU'LL BE ABLE TO
Ship a hybrid RAG (BM25 + vector + RRF + rerank) that beats your single-vector baseline by ≥ 15 Recall@K
Pick embedding model + chunker + vector store + reranker by hard numbers, not vibes
Gate every RAG change in CI with Ragas / DeepEval thresholds (Faithfulness ≥ 0.9, Recall@5 ≥ 0.85)
Run an air-gapped on-prem stack (Ollama + nomic-embed + pgvector) for regulated industries
Migrate embeddings safely with shadow re-index + dual-read + flag-driven cutover
SKILLS YOU'LL GAIN

Real skills, real career delta.

Skills you'll gain

12
  • Pick the right embedding model from MTEB + cost + privacyWorking

    Choose between text-embedding-3-large (Matryoshka), voyage-3-large, Cohere embed-v4, BGE-M3, Nomic-embed-text-v2-MoE based on language, modality, deployment and budget. Defend the pick with MTEB scores.

  • Choose & justify a chunking strategy by data shapeProduction

    Recursive vs semantic vs late-chunking vs Anthropic contextual retrieval vs parent-document. Code-aware and table-aware splitters for source code and structured docs.

  • Choose & justify a vector store from 8 production optionsProduction

    pgvector, Qdrant, Weaviate, Milvus, Pinecone, LanceDB, Vespa, Mongo Atlas — pick by scale, ops model, hybrid support, payload filtering, multi-tenancy. Read VectorDBBench results.

  • Implement hybrid search with RRF on at least two storesProduction

    BM25 + dense vector + RRF (k≈60). Score fusion vs rank fusion. Native hybrid in Qdrant/Weaviate/Pinecone vs hand-rolled with pgvector + tsvector.

  • Add query rewriting (HyDE, multi-query, decomposition)Working

    Choose rewriting strategy from query shape — short/under-specified → HyDE; ambiguous → multi-query; complex compound → decomposition + step-back. Measure lift on Recall@K.

  • Re-rank with Cohere/Jina/BGE/ColPaliProduction

    Cross-encoder rerank for surgical context. Open-source vs API. ColPali / ColBERT late-interaction for multi-vector retrieval. Trim 50 → 8 to cut token cost AND raise faithfulness.

  • Eval RAG with Ragas (faithfulness, answer relevance, context P/R)Production

    Build a labelled Q→chunk → answer dataset. Run Ragas + TruLens RAG triad. Gate CI on Recall@5, Faithfulness, p95 latency, $/query.

  • Ship Agentic RAG (corrective + self-RAG)Advanced

    LangGraph 1.0 state machine: retrieve → grade → rewrite/re-retrieve → generate → self-check → loop. Knows when to web-search, when to refuse, when to ask for clarification.

  • Index visually-rich PDFs with ColPali (no OCR)Advanced

    Multimodal RAG over slides, scanned PDFs, screenshots using ColPali multi-vector embeddings — beats OCR + text retrieval on layout-heavy corpora.

  • Observe + budget RAG in Langfuse / PhoenixProduction

    Trace every stage of a RAG call (embed → search → rerank → LLM). Per-stage latency + token attribution, $/query dashboards, cache-hit rate, negative-answer-rate.

  • Embedding-model migration with versioning + blue/green indexAdvanced

    Tag (model_id, dim, chunker_version) on every row. Run shadow-index re-embed, dual-read with feature flag, A/B eval before cutover. Never re-embed in place.

  • Defend retrieval-time prompt injection + PIIAdvanced

    Detect instructions hidden in retrieved chunks (OWASP LLM01). PII redaction at ingest, retrieval-time policy filters per tenant, signed retrieval audit trail.

RUNNABLE ON YOUR MACHINE
$ docker pull snap/rag-vector:hello
$ docker run --rm -it snap/rag-vector:hello
snap/rag-vector:hello
QUICK PREVIEW · 7 MIN
VERIFIED ENGINEER REVIEWS
The chunker-bench container finally answered 'what chunker for OUR docs' with numbers, not vibes.
@rag_raviVERIFY ON GITHUB
Re-ranking lesson: 4 minutes, +18 Recall@8 on our internal RAG. Highest ROI 5 lines I've ever shipped.
@kofi.infraVERIFY ON TWITTER
LESSONS10
HOURS~1.8
LEARNERS6,890
THIS WEEK+19%