INTROBLOCK · 01
LAKE · 7 MIN PREVIEW
Lakehouse architecture & open tables
Iceberg, Delta, Hudi. Time travel, MERGE, compaction. Object storage that behaves like a database — without the religious wars.
CONCEPTBLOCK · 02
What a lakehouse actually is
A lakehouse is object storage (Parquet on S3/GCS/ADLS) plus a transactional metadata layer (Iceberg / Delta / Hudi) that gives you ACID, schema evolution, time travel, and fast point-in-time reads. The data files are open and portable; the metadata is what makes them feel like a warehouse. You get warehouse semantics on lake economics — no double-loading, no proprietary storage, no vendor lock-in for the bytes themselves.
TIPChoose the table format first, the query engine second. The format is your ten-year decision.
WATCH OUTSmall-files-on-object-store is a query killer. Compaction is a first-class concern, not a chore.
DIAGRAMBLOCK · 03
Storage / metadata / engine — three layers
One copy of the data. Many engines. The metadata layer is the magic.
CODEBLOCK · 04
Iceberg in SQL — feels like a table, behaves like a lake
SQL1CREATE TABLE warehouse.events (
2 event_id bigint,
3 user_id bigint,
4 occurred_at timestamp,
5 payload string
6)
7USING iceberg
8PARTITIONED BY (days(occurred_at));
9
10INSERT INTO warehouse.events VALUES
11 (1, 42, current_timestamp(), 'click');
12
13-- Time travel: read the table as of yesterday
14SELECT count(*) FROM warehouse.events
15 TIMESTAMP AS OF '2026-04-26 00:00:00';
Same DDL most engineers know. The PARTITIONED BY (days(...)) is a hidden partition transform — Iceberg handles the bucketing.
CHEATSHEETBLOCK · 05
Five things to remember
01Open table formats (Iceberg/Delta/Hudi) give you ACID + time travel on object storage.
02Hidden partitioning beats user-managed partition columns. Use it.
03MERGE INTO is the upsert primitive. Idempotent CDC depends on it.
04Small files kill performance. Schedule compaction like a backup job.
05Don't dual-write to lake AND warehouse. Pick one source of truth.
MINIGAME · RAPIDFIRETFBLOCK · 06
True or false: 6 seconds each
Iceberg tables can be read by multiple engines without copying data.
CLAIM 1/5 · READY · scroll into view
LESSON COMPLETEBLOCK · 07
Lakehouse mental model: locked.
NEXTHello Iceberg: your first lakehouse table
WHAT YOU'LL WALK AWAY WITH
Real skills, real career delta.
Skills you'll gain
09- Pick a table format by trade-offWorking
Outcome from completing the course: pick a table format by trade-off.
- Wire MERGE / time-travel safelyWorking
Outcome from completing the course: wire merge / time-travel safely.
- Stream into the lakehouse without dual-write hellWorking
Outcome from completing the course: stream into the lakehouse without dual-write hell.
- Lake vs warehouse vs lakehouseWorking
Covered in lesson sequence — drop-in ready.
- Iceberg fundamentalsWorking
Covered in lesson sequence — drop-in ready.
- Delta Lake patternsWorking
Covered in lesson sequence — drop-in ready.
- Time travel & MERGEWorking
Covered in lesson sequence — drop-in ready.
- Compaction & vacuumWorking
Covered in lesson sequence — drop-in ready.
- Streaming into the lakehouseWorking
Covered in lesson sequence — drop-in ready.
Career & income delta
Career moves
- Lead a Lakehouse architecture & open tables initiative on your team — most orgs have it on the roadmap and few have shipped it.
- Consulting work at $150-300/hr — 'LAKE shipped to production' is a sought-after specialty in 2026.
- Move from generic IC to platform/AI-platform team where Lakehouse architecture & open tables expertise is the entry ticket.
Income impact
- $15-40K bump for senior ICs adding Lakehouse architecture & open tables to their resume.
- Freelance / consulting demand for the same skill: $150-300/hr in 2026.
- Closing enterprise deals often hinges on demonstrating the production patterns from this course.
Market resilience
- Lakehouse architecture & open tables is a durable skill across model and framework consolidations.
- Production guardrails (cost caps, observability, audit, evals) carry forward to whatever the 2027 stack is.
- Core patterns transfer to cloud, on-prem, and hybrid deployments.