DCNTCourse

Data contracts for data platforms

Lessons8modules
Total80mfull study
Quick7mtrailer
Projects8docker labs

hello-data-contract · ODCS v3.1 lint and validate in 60 seconds

Author a minimal but complete ODCS v3.1 contract YAML covering fundamentals, schema, quality, slaProperties, and server blocks, then run datacontract-cli lint on every save and deliberately introduce errors in each section to read and understand the exact validation messages.

snap/data-contracts:hello
Stack
datacontract-cli@0.10.xodcs@3.1.0python@3.12
Real-world use

Every data platform team eventually accumulates undocumented datasets described only in a stale Confluence page or a Slack thread. When a new engineer joins or a downstream team onboards a dataset, there is no authoritative source of truth for column types, nullability, freshness guarantees, or the owning team. An ODCS v3.1 contract YAML solves this by encoding schema, quality rules, SLO targets, and server location in a single machine-readable file that can be linted in CI, exported to dbt or Great Expectations, and linked from a data catalog. The hello-data-contract project gives any team a validated, lint-clean template they can adapt for any internal dataset in under an hour, replacing the undocumented page that nobody updates.

Portfolio value

A hiring manager reviewing a data engineering or analytics engineering portfolio wants evidence that the candidate understands governance tooling beyond writing SQL tests. Producing a lint-clean ODCS v3.1 contract with all five required sections — and a documented set of intentional errors with their exact CLI error messages — signals that the candidate has read the spec, understands the validation model, and can onboard a team to contract authoring from scratch. It also demonstrates familiarity with datacontract-cli, which is the emerging standard CLI for the ODCS ecosystem, and positions the candidate as someone who can establish contract hygiene practices rather than just consume them.

Builds on lessons
Lesson 1
Build plan
  1. Create a contract.yaml file with all five ODCS v3.1 top-level sections: fundamentals (id, version, status, owner, domain), schema (one table with at least four columns including a primary key and a nullable field), quality (one SQL-based rule and one predefined rule), slaProperties (freshness and completeness targets with numeric thresholds), and servers (one entry with type, host, and database).
  2. Install datacontract-cli 0.10.x and run datacontract lint contract.yaml against the complete file, confirm zero errors, and record the clean output.
  3. Introduce one intentional error in each of the five sections — for example a missing required fundamentals field, an invalid column type in schema, a malformed SQL expression in quality, a non-numeric threshold in slaProperties, and an unsupported server type — then run lint after each change and document the exact error message and line reference produced by the CLI.
  4. Fix all intentional errors, add a pre-commit hook using a shell script or Makefile target that runs datacontract lint on every save or commit, and verify the hook blocks a commit when a deliberate error is reintroduced.
  5. Write a one-page README that maps each ODCS section to the real-world question it answers (who owns this, what columns exist, what quality rules apply, what are the SLO targets, where does the data live) and explains which datacontract-cli export targets the contract can feed downstream.