Catch breakage at PR,
not in
production.
8 micro-lessons · ~78 min · Real Docker images
Data Contracts
Schema-as-API for data teams — breakage caught at PR time, not in production.
Real skills, real career delta.
Skills you'll gain
10- ODCS v3.1.0 contract authoringProduction
Write complete ODCS v3.1.0 YAML contracts covering schema fields, field-level and model-level quality rules, executable SLAs, server definitions, and ownership metadata. Lint contracts with datacontract-cli and validate against the ODCS JSON Schema.
- datacontract-cli: lint, test, changelog, exportProduction
Run datacontract-cli lint for spec compliance, test against live data sources, changelog to diff two contract versions and classify BREAKING/NON-BREAKING/DEPRECATION changes, and export to Avro, JSON Schema, Pydantic, dbt models, and SodaCL.
- Three-layer contract enforcement (spec → dbt → Soda/GX)Production
Wire declarative ODCS YAML in Git as the spec layer, dbt model contracts (contract: enforced: true) as the build-time gate, and Soda Core or Great Expectations checks as the runtime enforcement layer. Each layer catches a distinct failure class.
- Avro schema evolution and Confluent Schema Registry compatibility modesProduction
Configure BACKWARD, FORWARD, and FULL compatibility modes in Confluent Schema Registry for Avro subjects. Manage schema versions, register new schemas via confluent-kafka-python, and apply intra-topic declarative migration rules for breaking-change cutover.
- Confluent Schema Registry CEL condition and transform rulesProduction
Attach CEL-based condition rules and transform rules to Avro/Protobuf Schema Registry subjects to enforce field-level contract constraints at message-produce time. Route contract violations to a DLQ topic using the built-in DLQ action.
- CDC-based entity contracts with Debezium and KafkaProduction
Build a Postgres → Debezium → Kafka → Schema Registry pipeline where entity-level contracts are enforced via CEL rules. Inspect DLQ topics for contract violations and validate the full stack with a Docker Compose integration test harness.
- GitHub Actions contract gate with breaking-change PR botProduction
Write a GitHub Actions workflow that runs datacontract-cli changelog between base and PR branch contracts, classifies each change, posts a structured PR comment, and fails CI on unannounced breaking changes. Blocks merges on destructive schema changes.
- Airflow DAG circuit-breaker for batch contract enforcementProduction
Configure an Airflow DAG where a datacontract-cli test task runs upstream of dbt transformation tasks. On contract failure the DAG halts immediately and fires a Slack alert, preventing bad data from reaching BI dashboards or ML feature stores.
- Producer-side fixture generation from ODCS contract exportsWorking
Export an ODCS contract to Pydantic models and Avro schemas via datacontract-cli, then use Faker with the generated models to produce synthetic fixture payloads that are structurally guaranteed to satisfy the contract for integration testing.
- Downstream replay harness for schema-version migration (WAP pattern)Working
Replay a recorded Kafka message set through a v1→v2 intra-topic schema migration using Confluent Schema Registry transform rules, then validate all migrated messages against the v2 ODCS contract. Implement the Write-Audit-Publish pattern in pure Python for lake-side assets.