CHEATSHEET · 01Data contracts · operations cheatsheet
ODCS v3.1.0 contract anatomy
- ·dataContractSpecification: '3.1.0' at root; version, description, owner required
- ·servers: [{type: 'kafka'|'postgres'|'bigquery'|'s3', ...}] defines where data lives
- ·schema.fields: [{name, type, required, pii, description, ...}] — type is Avro
- ·quality.type: 'SodaCL'|'GreatExpectations'|'dbt_expectations'; checks array
- ·slaProperties: {freshness, completeness, accuracy, latency} with thresholds
- ·models: [{name, version, deprecationDate, ...}] for dbt model versioning
dbt 1.9 contract enforcement
- ·contracts: {enforced: true} on model block; build fails if column removed
- ·columns: [{name, data_type, description, constraints: [not_null, ...]}]
- ·dbt-checkpoint 2.x pre-commit hook: dbt-check-model-columns-exist
- ·dbt model versions: version: 2, latest_version: 2, deprecation_date: '2025-06-01'
- ·dbt build --select model_name exits non-zero on contract violation
- ·dbt docs generate includes contract status; dbt parse validates YAML syntax
Kafka Schema Registry compatibility (v7.8+)
- ·BACKWARD: new schema reads old data; safe for additive fields only
- ·FORWARD: old schema reads new data; safe for field removal only
- ·FULL: both directions; requires field default values and no reuse of field numbers
- ·NONE: no compatibility check; use only in dev; HTTP 409 on conflict
- ·curl -X POST http://registry:8081/subjects/{subject}-value/versions -d '{schema}'
- ·Avro field numbers immutable; reusing number breaks BACKWARD + FORWARD
Breaking-change CI gates
- ·buf breaking --against .git#branch=main for Protobuf; FILE + WIRE_JSON rules
- ·dbt-checkpoint dbt-check-model-columns-exist in .pre-commit-config.yaml
- ·datacontract-cli diff v1.yaml v2.yaml shows breaking vs non-breaking changes
- ·GitHub Actions: run breaking checks on every PR; fail if exit code != 0
- ·Schema Registry HTTP 409 Conflict blocks incompatible schema registration
- ·sqlfluff parse + dbt parse catch syntax errors before schema validation
Value-level enforcement (GX Core 1.x + Soda Core 3.x)
- ·GX FileDataContext: context = FileDataContext(root_dir='gx'); no DB credentials in code
- ·GX Checkpoint: checkpoint.run() returns CheckpointResult; .success == False fails DAG
- ·Soda SodaCL checks.yml: freshness < 2h, completeness > 99%, no_missing_values
- ·Soda scan = Scan(); scan.add_sodacl_yaml_file('checks.yml'); scan.execute()
- ·GX Data Docs: context.build_data_docs() generates HTML; commit to repo or S3
- ·Airflow: @task(trigger_rule='all_done') after GX/Soda; skip downstream on failure
Contract-driven code generation + discoverability
- ·datacontract-cli 0.10.x export --format dbt-yaml contract.yaml > schema.yml
- ·datacontract-cli export --format avro contract.yaml > schema.avsc
- ·datacontract-cli export --format great-expectations contract.yaml > suite.json
- ·CI job: regenerate all three; fail if committed artifacts differ from generated
- ·Backstage: register contract YAML as a Location; catalog-info.yaml references it
- ·datacontract-cli publish --server backstage-url pushes contract to Backstage API
CHEATSHEET · 02Data contracts · 2 AM debugging cheatsheet
Contract validation failures
- ·datacontract-cli lint <file.yaml> → check ODCS v3.1.0 syntax, required fields, enum values
- ·datacontract-cli diff v1.yaml v2.yaml → pinpoint breaking changes (removed fields, type shifts)
- ·dbt build --select <model> → enforced: true models exit non-zero if contracted columns missing
- ·dbt-checkpoint hook → pre-commit blocks commits with schema violations before dbt runs
- ·buf breaking --against .git#branch=main → gRPC field number reuse, wire-type changes caught
Schema Registry / Kafka compatibility
- ·curl -X GET http://localhost:8081/subjects/<topic>-value/versions → list all registered schemas
- ·curl -X POST -H 'Content-Type: application/vnd.schemaregistry.v1+json' http://localhost:8081/compatibility/subjects/<topic>-value/versions/latest -d '{"schema":"..."}' → test compatibility before register
- ·BACKWARD mode: new schema must read old data; blocks required field deletion, type narrowing
- ·FORWARD mode: old schema must read new data; blocks required field addition without default
- ·FULL mode: both BACKWARD + FORWARD; safest but most restrictive; use for shared dimensions
dbt contract enforcement
- ·enforced: true in contract block → dbt build fails if column missing or type mismatch
- ·version: 2 + deprecation_date: '2025-06-01' → signals v1 end-of-life; dbt logs warning to consumers
- ·dbt build --select state:modified+ → test only changed models + downstream; catches contract breaks early
- ·dbt parse → validates contract YAML syntax before build; fails fast on typos in column names
- ·dbt docs generate → contract metadata appears in dbt Cloud lineage; consumers see SLOs, owners
Quality gate failures (GX / Soda)
- ·GX Checkpoint: FileDataContext.build_checkpoint() → fails DAG if expectation_suite violations
- ·GX Expectation: expect_column_values_to_not_be_null → catches 40% nulls in revenue before dashboard
- ·Soda SodaCL: freshness < 2h, completeness > 99% → SLA breach blocks downstream tasks
- ·Soda scan → exit code 1 if any check fails; wire to Airflow task_failed_alert or dbt post-hook
- ·GX Data Docs HTML → inspect failed expectations; shows row counts, null %, distribution by value
CI/CD contract gates
- ·GitHub Actions: datacontract-cli lint on every PR → fail if ODCS YAML invalid or breaking
- ·buf breaking in CI → block PR if .proto field number reused, wire type changed, required field deleted
- ·dbt-checkpoint pre-commit → run before git commit; blocks schema changes locally, no CI wait
- ·Schema Registry HTTP 409 → compatibility check failed; inspect error body for which field/type broke
- ·dbt build --fail-fast → stop on first contract violation; speeds up feedback loop in CI
Contract discoverability & debugging
- ·Backstage: datacontract-cli export --format backstage → register ODCS as API entity in catalog
- ·datacontract-cli export --format dbt-yaml → regenerate schema.yml from ODCS; diff against committed
- ·datacontract-cli export --format avro → regenerate .avsc from ODCS; validate against Kafka schema
- ·dbt meta: contract_owner, sla_freshness_hours → searchable in dbt Cloud; link to runbook
- ·grep -r 'deprecated: true' dbt/models/ → find all deprecated models; audit consumer migrations