Quick Intro~7 MIN· STRM

Real-time data streaming

Full Study

A scannable trailer of the 9-lesson course. Read top to bottom — no clicks needed.

INTROBLOCK · 01
STRM · 7 MIN PREVIEW

Stream-first design that holds under load.

89% of 4,175 IT leaders in Confluent's 2025 Data Streaming Report rate streaming as critical — but most teams are still bolting it onto a batch architecture and wondering why it hurts. This trailer shows the difference between a streaming product and a batch product with Kafka stapled on.

CONCEPTBLOCK · 02

The one-line difference

A streaming platform is a durable, ordered, partitioned log that producers append to and consumers read from at their own pace. It is not a queue, not a pub-sub, not a database — it is a log. Once you internalise the log, every partition / consumer-group / offset / watermark question answers itself. If the only thing your 'streaming' system does is move events into Postgres every 10 minutes via a cron job, you don't have a streaming system. You have batch with extra steps.
TIPPick the platform LAST. Pick the partitioning key, time semantics, and consumer guarantees FIRST — those are 95% of the design.
WATCH OUTConfluent's 2024 incident review reported that 9 of the top 10 production outages traced back to consumer-group rebalance storms triggered by a single misconfigured client. Streaming is sharp.
GOTCHAauto.commit.enable=true is the most expensive 12-character footgun in distributed systems. We turn it off in Lesson 3 and never speak of it again.
DIAGRAMBLOCK · 03

Producer → topic partitions → consumer groups

PRODUCERPART 0PART 1PART 2INVOICEFRAUDBI/ICEBERG
ONE topic. Many consumer groups. Each group reads at its own pace, with its own offsets, without affecting any other consumer.
CODEBLOCK · 04

A correct producer + consumer in 16 lines

PYTHON
1from confluent_kafka import Producer, Consumer
2import json
3
4p = Producer({
5 "bootstrap.servers": "localhost:9092",
6 "enable.idempotence": True, # dedup retries
7 "acks": "all", # wait for full ISR replication
8 "linger.ms": 5, # batch up to 5 ms
9})
10p.produce("orders", key="order-7", value=json.dumps({"id": 7, "amt": 99.0}))
11p.flush()
12
13c = Consumer({
14 "bootstrap.servers": "localhost:9092",
15 "group.id": "invoice-processor",
16 "auto.offset.reset": "earliest",
17 "enable.auto.commit": False, # commit AFTER processing
18 "isolation.level": "read_committed",
19})
20c.subscribe(["orders"])
21while (msg := c.poll(1.0)) is not None and not msg.error():
22 process(json.loads(msg.value()))
23 c.commit(msg)
Lines 6-7: idempotent producer + acks=all is the modern default. Line 16: auto-commit OFF. Line 17: read_committed pairs with transactional producers (Lesson 6). Line 22: commit AFTER process() succeeds — replay-safe.
CHEATSHEETBLOCK · 05

The 6 rules every 2026 streaming shipper knows

01The log is the source of truth. Caches and DBs are derivations.
02Pick the partitioning key from your business invariants — same entity = same partition = ordered.
03Idempotent producer + acks=all + manual commit. Always.
04Event time, never processing time. Wall clocks are not honest.
05Schema Registry is non-negotiable. Untyped JSON on a bus is technical debt with a return address.
06Backfill = replay. Make consumers idempotent or you'll discover this the hard way.
MINIGAME · RAPIDFIRETFBLOCK · 06

Quick check — true or false?

Kafka, Redpanda, and Pulsar all speak the same producer/consumer wire protocol.
CLAIM 1/6 · READY · scroll into view
CONCEPTBLOCK · 07

What you'll ship in the full study

Nine lessons. Eight docker projects. By the end you will have: — A Redpanda single-broker compose stack with a correct Python producer/consumer (lift-to-work for any new service). — A Postgres → Debezium → Kafka → console CDC tail you can point at your real OLTP database. — A Flink SQL tumbling/hopping/session window job over event-time with watermarks. — An exactly-once pipeline with transactional producer + idempotent sink + read_committed. — A backfill replayer that re-emits from offset 0 into a parallel consumer group without reprocessing the live workload. — A Karapace Schema Registry + CI compatibility gate that breaks the build on a backward-incompatible schema change. — A Tableflow / Iceberg sink that turns a Kafka topic into a live lakehouse table. — A full observability stack (OTel + Prometheus + Grafana) over a producer/consumer/Flink job, including consumer-lag alerts. Every docker project is meant to be lifted into your real work — not a demo.
INCLUDEDEach project ships with composeYaml, expectedOutcome, and a 'lift to work' note explaining how to drop it into your team's repo.
LESSON COMPLETEBLOCK · 08

That's the trailer.

NEXTLesson 1 · Kafka vs Redpanda vs Pulsar
WHAT YOU'LL WALK AWAY WITH

Real skills, real career delta.

Skills you'll gain

10
  • Pick Kafka / Redpanda / Pulsar by trade-offWorking

    Place all three on the ops-cost vs ecosystem vs multi-tenancy axes; defend the choice in a design review without resorting to vendor decks.

  • Design stream-first systemsProduction

    Identify when the log should be the source of truth (vs polling/batch), pick a partitioning key from business invariants, and avoid the 'batch with Kafka stapled on' anti-pattern.

  • Build durable producers and consumersProduction

    Idempotent producer + acks=all + manual commit + read_committed — the four-line discipline that turns a demo into a service.

  • Reason about event time and watermarksProduction

    Distinguish event/ingest/processing time; configure watermark strategy with bounded out-of-orderness; route late events to side-outputs instead of dropping them.

  • Implement stateful Flink jobsProduction

    Write tumbling/hopping/session window aggregates in Flink SQL with RocksDB state, checkpointing, and graceful rescaling — the daily bread of cross-team stream processing.

  • Ship exactly-once pipelinesAdvanced

    Wire a transactional producer + read_committed consumer + idempotent sink, understand the two-phase commit cost, and explain why exactly-once is per-pipeline (not per-system).

  • Stream Postgres CDC into a lakehouseProduction

    Run Debezium 2.x against Postgres, land into Kafka topics with Avro schemas, expose as Iceberg tables via Tableflow / Iceberg sink — production medallion in a docker compose.

  • Govern schemas across teamsProduction

    Configure backward/forward/full compatibility per topic, set CI gates that fail breaking changes before they merge, document the upgrade dance for every Avro/Protobuf change.

  • Observe streaming systems in productionProduction

    Define RED metrics + lag SLOs, instrument with OTel, alert on rebalance storms and DLQ growth, and maintain a runbook every on-call can execute at 03:00.

  • Run a streaming production rolloutAdvanced

    Sequence the rollout — shadow → dual-write → cutover → backfill — with quotas, rate limits, and a kill switch; document the ADR that lets the next team replicate the playbook.

Career & income delta

Career moves
  • Title yourself credibly as a 'streaming engineer' or 'data platform engineer' — the 2026 hiring channel for senior IC roles at $200-360K in US/EU markets.
  • Lead a streaming initiative on your team — most enterprise roadmaps have a 'real-time' line item that nobody owns; that ownership is the staff-promo lever.
  • Pick up consulting work at $200-400/hr — the most common 2026 inquiry is 'we have Kafka but it's slow / lossy / costing too much'.
  • Move from generic backend role to platform / data-platform team where streaming expertise is the entry ticket and the path to staff/principal.
Income impact
  • $25-50K bump for senior backend ICs adding production streaming to their resume in 2026.
  • $60-150K bump moving from a generic role to a data-platform / streaming-platform team at a series-B+ company.
  • Freelance / consulting rates: $200-400/hr — Debezium + Flink SQL + exactly-once is the rate-bumping triple play.
  • Enterprise sales engineering: closing one 6-figure analytics deal per quarter often requires demonstrating the CDC → Iceberg path live.
Market resilience
  • The log abstraction is durable — every framework and platform consolidation in the last 12 years has reinforced it, not replaced it.
  • The Kafka wire protocol is the de facto interop standard; investments transfer across Kafka, Redpanda, WarpStream, AutoMQ, and Confluent Cloud.
  • CDC + Iceberg is the cross-vendor lakehouse pattern (Snowflake, Databricks, Trino, BigQuery all read it natively) — protocol fluency outlives any single vendor.
  • Production discipline (lag SLO, schema CI, exactly-once, observability) carries forward to whatever the 2027 stream stack is — the tools change, the discipline doesn't.