CHEATSHEET · 01Streaming · master cheatsheet
Mental model
- ·Topic = ordered, partitioned, durable log
- ·Partition = the unit of parallelism per consumer group
- ·Offset = your bookmark; commits are how you mark 'done'
- ·Consumer group = a parallel scan of the topic; one consumer per partition max
- ·Tombstone (key, null) = delete record for compacted topics
Producer commitments
- ·enable.idempotence=true (default in Kafka 3.x+) — dedup retries
- ·acks=all — wait for full ISR replication before ACK
- ·Pair acks=all with min.insync.replicas≥2 in prod
- ·Use keys for ordering invariants — same entity → same partition
- ·Transactions for cross-topic atomic writes (Lesson 6)
Consumer commitments
- ·enable.auto.commit=FALSE — commit AFTER processing
- ·isolation.level=read_committed when transactions are in play
- ·Tune max.poll.interval.ms to 2-3× your worst-case process() time
- ·Process idempotently — replay is normal, not exceptional
- ·Kill -9 should never lose data; design for it
Time semantics
- ·Event time = when the event happened in the world
- ·Ingest time = when the broker received it
- ·Processing time = when your operator saw it (lies on rebalance / replay)
- ·Watermarks = 'I will not see events earlier than this' — the operator's promise
- ·Late events = events past the watermark; route to a side-output, never drop silently
State semantics
- ·Stateless first; stateful only when you must aggregate or join
- ·RocksDB-backed state stores (Flink, Kafka Streams) survive restarts
- ·Checkpoints persist state — incremental, async, exactly-once-on-restore
- ·Bound your state — TTL, windows, or an LRU eviction
- ·State size is the new memory pressure; budget it like RAM
Production guardrails
- ·Quotas at the broker — protect tenants from each other
- ·Consumer-lag SLO + page on lag > P99 baseline × 3
- ·Schema Registry compatibility CI gate — block backward-incompatible PRs
- ·DLQ for poison messages — never drop, always quarantine
- ·Backfill is a deploy event — practice it before you need it
- ·Tag every producer/consumer with service + version for tracing
CHEATSHEET · 02Platform & framework picks · 2026
Default broker for new builds
- ·Apache Kafka 4.x — standard, JVM, KRaft only (ZooKeeper removed in 4.0). Use when ecosystem breadth matters.
- ·Redpanda — single-binary C++, Kafka API compatible, lower ops cost. Use for small teams without JVM expertise.
- ·Apache Pulsar 3.x/4.x — tiered storage native, multi-tenancy first-class. Use when SaaS-grade tenant isolation dominates.
Storage-decoupled brokers
- ·Confluent WarpStream / Kora — stateless brokers backed by S3, ~10× cheaper at high throughput.
- ·AutoMQ — open-source stateless Kafka on S3.
- ·Use when egress + replication costs dominate your bill — typical for global, write-heavy workloads.
Stream processing
- ·Apache Flink 2.x — SQL + DataStream, disaggregated state (ForSt), materialized tables (FLIP-435), exactly-once. Cross-team production default.
- ·Kafka Streams — JVM library, in-app, no separate cluster. Use for service-local processing only.
- ·RisingWave / Materialize — streaming SQL databases. Use when you want a Postgres-shaped face on a stream.
- ·ksqlDB — being deprecated in favour of Flink SQL on Confluent Cloud. Migrate.
Schema Registry
- ·Confluent Schema Registry — the de facto. Free for community licence; paid for advanced features.
- ·Karapace — Aiven's open-source SR (Apache 2.0). Drop-in replacement, lighter footprint.
- ·Apicurio Registry 3.x — Red Hat's. Avro/Protobuf/JSON Schema/AsyncAPI, integrates with Kafka.
CDC + Lakehouse
- ·Debezium 2.x — Postgres / MySQL / MongoDB / SQL Server tail. Engine + Server modes.
- ·Confluent Tableflow (GA 2025) — exposes Kafka topics as native Iceberg tables.
- ·Apache Iceberg + Kafka Connect S3 sink — DIY lakehouse stream landing.
- ·Estuary Flow / Decodable — managed CDC + transforms when you don't want to run Debezium yourself.
Observability
- ·OpenTelemetry — instrument producers/consumers/Flink jobs with kafka.* spans.
- ·Prometheus + Grafana — RED metrics + consumer lag from JMX/Kafka exporter.
- ·Confluent Health+ / Lenses / Conduktor — operator-grade dashboards.
- ·Streamdal — runtime data observability (in-flight payload validation).
Avoid / migrate
- ·ZooKeeper-based Kafka (3.x and below) — KRaft is the only supported mode in 4.x.
- ·ksqlDB — Confluent's own roadmap points to Flink SQL.
- ·Custom JSON-on-the-bus without a Schema Registry — every regret in streaming starts here.
- ·Spark Structured Streaming for sub-second latency — Flink wins clearly here.