Make your pipeline safe to run twice
If running a job a second time corrupts your data, every retry is a gamble and every backfill is a held breath. Idempotency removes the gamble for almost no cost.
A pipeline step is idempotent when running it twice produces the same result as running it once. This sounds academic until 2am, when a job half-finishes, the orchestrator retries it, and you have to decide whether re-running it will double your revenue numbers. With an idempotent step, that decision is trivial: just run it again.
The pattern that fails
The default many pipelines fall into is append-on-run:
INSERT INTO orders_daily
SELECT * FROM staging WHERE date = '2026-03-01';
Run it once, you get one day of orders. Run it twice — because of a retry, a manual re-trigger, or a backfill that overlaps — and you get that day twice. Nothing errors. The numbers are simply wrong, and they stay wrong until someone notices the daily total looks high.
The pattern that's safe
Make the write define its own scope and replace it. The simplest version is delete-then-insert inside a transaction, keyed on the partition you're producing:
BEGIN;
DELETE FROM orders_daily WHERE date = '2026-03-01';
INSERT INTO orders_daily
SELECT * FROM staging WHERE date = '2026-03-01';
COMMIT;
Now the job owns the 2026-03-01 partition completely. Run it once, twice, or ten times — the partition ends up in exactly one correct state. On warehouses that support it, MERGE or an atomic partition-overwrite does the same job more efficiently, but the principle is identical: the unit of work is a partition you fully replace, not rows you append.
Design every batch step to own a partition and overwrite it. Then a retry is never a risk, and a backfill is just "run the same job for older dates."
What this buys you
- Free retries. The orchestrator can re-run a failed task without anyone reasoning about side effects.
- Trivial backfills. Reprocessing March is a loop over dates calling the same idempotent step — no special path, no reconciliation.
- Safe deploys. A logic fix can be applied by simply re-running affected partitions, confident the output converges.
The one rule
Pick a partition key — usually a date, sometimes a date plus a tenant — and make sure every write is scoped to it and replaces it atomically. Avoid blind appends, and avoid updates that depend on the current contents of the target table. If a step can't be made idempotent (calling a non-idempotent external API, say), isolate it behind a dedupe key so the rest of the pipeline stays safe.
It's the cheapest reliability work there is, and it's the first thing we look for when we read a new team's pipelines.