Tests you'll still trust in a year
Most data tests start strict and end ignored. The ones that survive are few, specific, and tied to something a human actually cares about.
It's easy to add data tests. It's hard to add tests people still trust twelve months later. The failure pattern is familiar: a team enables hundreds of assertions, they fire constantly for reasons nobody acts on, and the whole suite gets mentally filed under "noise."
Test the contract, not the data's mood
A good test encodes a promise: this column is never null, this key is unique, this total can't be negative. A bad test encodes an expectation that drifts: "daily revenue is between X and Y" — true until the business grows and the band becomes a nuisance. Prefer invariants that are structurally true over thresholds that depend on the weather.
- Uniqueness and not-null on keys — these almost never produce false alarms.
- Referential checks — every order references a real customer.
- Accepted values — status is one of a known set; catches upstream surprises.
If a failing test doesn't change what someone does, it shouldn't fail loudly. Demote it to a report or delete it.
Severity is a feature
Not every check deserves to block a pipeline. Split tests into ones that halt the run (a broken primary key) and ones that merely warn (a distribution looks odd). Reserving the loud channel for genuinely actionable failures is what keeps the team reading it a year from now.