AIDQCourse

AI-ready data quality

Lessons8modules
Total86mfull study
Quick7mtrailer
Projects8docker labs

dq-gate-pipeline · DuckDB + dbt + dbt-expectations + Soda + Airflow

Fail the build on any contract breach or DQ check failure. The first AI-ready CI gate every team should adopt.

snap/ai-ready-data:dq-gateRepo · ai-ready-dq-gate
$git clonehttps://github.com/snap-dev/ai-ready-dq-gate.git
docker-compose.yml
services:
  duckdb:
    image: ghcr.io/duckdb/duckdb:latest
    volumes: ["./warehouse:/data"]
    command: tail -f /dev/null

  dbt:
    image: ghcr.io/dbt-labs/dbt-duckdb:1.10.0
    working_dir: /usr/app
    depends_on: [duckdb]
    volumes:
      - ./dbt:/usr/app:ro
      - ./warehouse:/data
      - ./profiles.yml:/root/.dbt/profiles.yml:ro
    command: >-
      sh -c "dbt deps --quiet && dbt build --fail-fast --vars '{contract_enforced: true}'"

  soda:
    image: sodadata/soda-core:3
    depends_on: [dbt]
    volumes:
      - ./soda:/sodacl:ro
      - ./warehouse:/data
    command: >-
      sh -c "soda scan -d duckdb_warehouse -c /sodacl/configuration.yml /sodacl/checks.yml &&
             jq '.summary' /sodacl/report.json"
Run
~/ai-ready-dq-gate · zsh
$ docker compose up --abort-on-container-exit
dbt build green; Soda scan: 14/14 passed; exit 0.
What you'll observe
dbt fails the build on any model with `contract: enforced: true` whose schema drifted
dbt-expectations runs the representativeness/uniqueness checks
Soda scan emits structured JSON; non-zero exit on any failure
Injecting 5% null emails into the seed CSV reproduces a hard CI failure
report.json captured to ./warehouse for PR attachment
Lift this to your work

Replace DuckDB with Snowflake / BigQuery / Postgres in profiles.yml; replace seed CSVs with your real source tables; promote `dbt build` to a required CI step. The dbt-expectations + Soda combo gives you 'shift-left' DQ — bad rows die in PR review, not in the customer's RAG answer.