STRMMOD.STRM-09 · v1.0

Pipelines for
real workloads,
not demos.

9 micro-lessons · ~90 min · Real Docker images

SCOPE.A · CHANNEL 1 · STRM
TIME/DIV · 100ms
TRIGGER · RISING
LIVE
EVENTS/s · 1.2K
STRMDATA ENGINEERINGHOT

Real-time data streaming

Stream-first design that holds up under load.

89% of 4,175 IT leaders see streaming platforms as critical. Kafka 4.0 removed ZooKeeper. Flink 2.x added disaggregated state. Confluent Tableflow GA exposed Kafka topics as Iceberg tables. The stack consolidated; the discipline didn't.
WHAT YOU'LL LEARN
01Kafka vs Redpanda vs Pulsar
02Stream-first design (outbox, partition keys)
03Producers, consumers & ordering
04Event time & watermarks
05Stateful Flink + RocksDB / ForSt
06Exactly-once semantics
07CDC + Iceberg lakehouse
08Schema evolution + CI gate
09Production observability & rollout
YOU'LL BE ABLE TO
Pick Kafka / Redpanda / Pulsar by trade-off and ship a producer/consumer that survives kill -9.
Build stateful Flink jobs over event-time with watermarks, RocksDB / ForSt state, and incremental checkpoints.
Run Postgres CDC into an Iceberg lakehouse end-to-end — no nightly batch job.
SKILLS YOU'LL GAIN

Real skills, real career delta.

Skills you'll gain

10
  • Pick Kafka / Redpanda / Pulsar by trade-offWorking

    Place all three on the ops-cost vs ecosystem vs multi-tenancy axes; defend the choice in a design review without resorting to vendor decks.

  • Design stream-first systemsProduction

    Identify when the log should be the source of truth (vs polling/batch), pick a partitioning key from business invariants, and avoid the 'batch with Kafka stapled on' anti-pattern.

  • Build durable producers and consumersProduction

    Idempotent producer + acks=all + manual commit + read_committed — the four-line discipline that turns a demo into a service.

  • Reason about event time and watermarksProduction

    Distinguish event/ingest/processing time; configure watermark strategy with bounded out-of-orderness; route late events to side-outputs instead of dropping them.

  • Implement stateful Flink jobsProduction

    Write tumbling/hopping/session window aggregates in Flink SQL with RocksDB state, checkpointing, and graceful rescaling — the daily bread of cross-team stream processing.

  • Ship exactly-once pipelinesAdvanced

    Wire a transactional producer + read_committed consumer + idempotent sink, understand the two-phase commit cost, and explain why exactly-once is per-pipeline (not per-system).

  • Stream Postgres CDC into a lakehouseProduction

    Run Debezium 2.x against Postgres, land into Kafka topics with Avro schemas, expose as Iceberg tables via Tableflow / Iceberg sink — production medallion in a docker compose.

  • Govern schemas across teamsProduction

    Configure backward/forward/full compatibility per topic, set CI gates that fail breaking changes before they merge, document the upgrade dance for every Avro/Protobuf change.

  • Observe streaming systems in productionProduction

    Define RED metrics + lag SLOs, instrument with OTel, alert on rebalance storms and DLQ growth, and maintain a runbook every on-call can execute at 03:00.

  • Run a streaming production rolloutAdvanced

    Sequence the rollout — shadow → dual-write → cutover → backfill — with quotas, rate limits, and a kill switch; document the ADR that lets the next team replicate the playbook.

RUNNABLE ON YOUR MACHINE
$ docker pull snap/streaming:hello
$ docker run --rm -it snap/streaming:hello
snap/streaming:hello
QUICK PREVIEW · 7 MIN
VERIFIED ENGINEER REVIEWS
Best 'event time vs processing time' explanation, period.
@stream_sageVERIFY ON GITHUB
We rebuilt our pipeline after the watermark lesson.
@devops_julesVERIFY ON GITHUB
LESSONS9
HOURS~1.5
LEARNERS3,640
THIS WEEK+17%