RAG vs Fine-tuning vs Hybrid bench harness
Same task, three implementations (prompt / RAG / SFT). One CSV your ADR can cite.
# docker-compose.yml — rag-vs-ft-bench
services:
qdrant:
image: qdrant/qdrant:v1.13.0
ports: ["6333:6333"]
volumes:
- ./qdrant-data:/qdrant/storage
vllm:
image: vllm/vllm-openai:latest
command: --model Qwen/Qwen2.5-7B-Instruct --enable-lora --lora-modules legal=./adapters/legal --max-loras 4 --port 8000
volumes:
- ./adapters:/app/adapters:ro
ports: ["8000:8000"]
deploy:
resources:
reservations:
devices: [{ capabilities: ["gpu"] }]
bench:
image: python:3.12-slim
working_dir: /app
depends_on: [qdrant, vllm]
volumes:
- ./src:/app/src:ro
- ./golden:/golden:ro
- ./out:/out
- ./requirements.txt:/app/requirements.txt:ro
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY:?}
QDRANT_URL: http://qdrant:6333
VLLM_URL: http://vllm:8000/v1
JUDGE_MODEL: ${JUDGE_MODEL:-claude-opus-4-7}
TASK: ${TASK:-legal-clause-classify}
command: bash -c "pip install -q -r requirements.txt && python -m src.bench --task /golden/$${TASK}.json --out /out/decision-report.csv"
Use this as the evidence pack for any 'should we fine-tune?' debate. Drop YOUR golden set into /golden, set TASK to the filename, share /out/decision-report.csv with eng + product. Closes the debate without a 90-minute meeting. For new domains, copy the JSONL schema and add 50-200 hand-labelled items.
RAG vs Fine-tuning vs Hybrid bench harness
Same task, three implementations (prompt / RAG / SFT). One CSV your ADR can cite.
# docker-compose.yml — rag-vs-ft-bench
services:
qdrant:
image: qdrant/qdrant:v1.13.0
ports: ["6333:6333"]
volumes:
- ./qdrant-data:/qdrant/storage
vllm:
image: vllm/vllm-openai:latest
command: --model Qwen/Qwen2.5-7B-Instruct --enable-lora --lora-modules legal=./adapters/legal --max-loras 4 --port 8000
volumes:
- ./adapters:/app/adapters:ro
ports: ["8000:8000"]
deploy:
resources:
reservations:
devices: [{ capabilities: ["gpu"] }]
bench:
image: python:3.12-slim
working_dir: /app
depends_on: [qdrant, vllm]
volumes:
- ./src:/app/src:ro
- ./golden:/golden:ro
- ./out:/out
- ./requirements.txt:/app/requirements.txt:ro
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY:?}
QDRANT_URL: http://qdrant:6333
VLLM_URL: http://vllm:8000/v1
JUDGE_MODEL: ${JUDGE_MODEL:-claude-opus-4-7}
TASK: ${TASK:-legal-clause-classify}
command: bash -c "pip install -q -r requirements.txt && python -m src.bench --task /golden/$${TASK}.json --out /out/decision-report.csv"
Use this as the evidence pack for any 'should we fine-tune?' debate. Drop YOUR golden set into /golden, set TASK to the filename, share /out/decision-report.csv with eng + product. Closes the debate without a 90-minute meeting. For new domains, copy the JSONL schema and add 50-200 hand-labelled items.