Insights / Architecture · 2023-06-13 · 6 min read

Streaming is not a default

Streaming is exciting, and for a narrow set of problems it's the only right answer. For most analytics, it's a large bill and an operational burden bought to solve a latency problem nobody had.

It's tempting to reach for streaming because it feels modern and powerful. But streaming trades simplicity for latency, and most analytical workloads don't need the latency. Before committing to always-on infrastructure, it's worth asking what the freshness requirement actually is — in business terms, not aspiration.

Ask what decision needs the freshness

If a number informs a daily decision, hourly or daily batch is plenty, and far cheaper to build and run. Streaming earns its complexity when a decision genuinely depends on seconds-old data: fraud blocking, live operational dashboards, real-time personalization. Those are real — and they're a minority.

The right question isn't "could this be real-time?" It's "what decision gets worse if this is an hour old?" Often the honest answer is none.

The hidden costs

Streaming systems run continuously, so they fail continuously, at hours batch jobs don't. Exactly-once semantics, out-of-order events, and state management are genuinely hard, and the operational load lands on a team that may have adopted streaming for a workload that didn't require any of it.

A reasonable default

Start batch. Make it incremental and frequent if you need fresher data. Reach for streaming when a specific decision provably needs sub-minute latency — and then scope it to just that decision, rather than rebuilding the whole platform around a requirement most of it doesn't share.


← All insights