Skills you'll gain
12- Pick the right embedding model from MTEB + cost + privacyWorking
Choose between text-embedding-3-large (Matryoshka), voyage-3-large, Cohere embed-v4, BGE-M3, Nomic-embed-text-v2-MoE based on language, modality, deployment and budget. Defend the pick with MTEB scores.
- Choose & justify a chunking strategy by data shapeProduction
Recursive vs semantic vs late-chunking vs Anthropic contextual retrieval vs parent-document. Code-aware and table-aware splitters for source code and structured docs.
- Choose & justify a vector store from 8 production optionsProduction
pgvector, Qdrant, Weaviate, Milvus, Pinecone, LanceDB, Vespa, Mongo Atlas — pick by scale, ops model, hybrid support, payload filtering, multi-tenancy. Read VectorDBBench results.
- Implement hybrid search with RRF on at least two storesProduction
BM25 + dense vector + RRF (k≈60). Score fusion vs rank fusion. Native hybrid in Qdrant/Weaviate/Pinecone vs hand-rolled with pgvector + tsvector.
- Add query rewriting (HyDE, multi-query, decomposition)Working
Choose rewriting strategy from query shape — short/under-specified → HyDE; ambiguous → multi-query; complex compound → decomposition + step-back. Measure lift on Recall@K.
- Re-rank with Cohere/Jina/BGE/ColPaliProduction
Cross-encoder rerank for surgical context. Open-source vs API. ColPali / ColBERT late-interaction for multi-vector retrieval. Trim 50 → 8 to cut token cost AND raise faithfulness.
- Eval RAG with Ragas (faithfulness, answer relevance, context P/R)Production
Build a labelled Q→chunk → answer dataset. Run Ragas + TruLens RAG triad. Gate CI on Recall@5, Faithfulness, p95 latency, $/query.
- Ship Agentic RAG (corrective + self-RAG)Advanced
LangGraph 1.0 state machine: retrieve → grade → rewrite/re-retrieve → generate → self-check → loop. Knows when to web-search, when to refuse, when to ask for clarification.
- Index visually-rich PDFs with ColPali (no OCR)Advanced
Multimodal RAG over slides, scanned PDFs, screenshots using ColPali multi-vector embeddings — beats OCR + text retrieval on layout-heavy corpora.
- Observe + budget RAG in Langfuse / PhoenixProduction
Trace every stage of a RAG call (embed → search → rerank → LLM). Per-stage latency + token attribution, $/query dashboards, cache-hit rate, negative-answer-rate.
- Embedding-model migration with versioning + blue/green indexAdvanced
Tag (model_id, dim, chunker_version) on every row. Run shadow-index re-embed, dual-read with feature flag, A/B eval before cutover. Never re-embed in place.
- Defend retrieval-time prompt injection + PIIAdvanced
Detect instructions hidden in retrieved chunks (OWASP LLM01). PII redaction at ingest, retrieval-time policy filters per tenant, signed retrieval audit trail.