A buyer once asked us to drive override rate to zero. We refused. A zero override rate is not a sign that the system works; it is a sign that nobody is checking it anymore.
What override rate actually measures
Override rate is the share of outputs from an AI-assisted step that the human rewrites, replaces, or discards before the work is shipped. It is not the refusal rate (the system declined), and it is not the error rate (the output was wrong in a way somebody noticed later). It is the working rate of disagreement between the model and the person responsible for the result.
We collect it on every engagement we run, alongside the other five receipts. It sits in the middle of the dashboard for a reason: it is the single line most diagnostic of whether the rest of the numbers are honest. (See /the-six-receipts-explained for the full receipt set.)
Why a non-zero number is a feature
A working human-in-the-loop arrangement has the human checking, correcting, and occasionally rejecting what the system produces. That is the whole design. If override rate drops to zero, one of three things has happened, and only one of them is good news.
The first possibility is the boring, healthy one: the task class is genuinely well-matched to the system, and the gate is correctly tuned so the human only sees the cases that need attention. The override series will not be zero across the dashboard — it will be zero on the easy slice and non-zero on the hard slice. That is a tuned gate doing its job. (Class C — this is how we configure the routing.)
The second possibility is that the human has stopped reading. Survey-style "approve all" behavior shows up as override rate collapsing to zero overnight on every slice at once, including slices that previously had real disagreement. That is the failure mode every honest operator watches for, because it is the one that produces the worst downstream surprises.
The third possibility is that the system has been quietly rerouted around the human entirely. Override rate cannot be non-zero if the human is no longer in the loop. That sounds obvious, and yet it is the most common silent regression we find when we audit somebody else's pipeline.
So we do not chase a low override rate. We chase a stable, explainable one, slice by slice, with the gates documented.
How we tune the gates
The override series is read against the rest of the dashboard. If override rate is high and cost-per-useful-task is also high, the prompt scaffolding or model choice is wrong and the team is doing the work twice. If override rate is high but cost-per-useful-task is low, the system is producing cheap drafts the human polishes — sometimes that is exactly the intended design, and the receipt sheet says so.
When we change a gate — loosen a guardrail, add a verification step, re-route a task class — we publish the override-rate delta the change produced, alongside the falsifier we wrote before we ran the experiment. The discipline is documented in /transparency-architecture-overview. The numbers update; the falsifier sheet updates with them. If a change moves the line in the wrong direction, we say so on the page rather than quietly reverting and pretending the experiment never ran.
What the field tells us about the human step
The broader literature on human-AI collaboration treats the human verification step as load-bearing, not optional, and treats systems that hide it as failure modes rather than maturity. (Class E — this is the consensus framing across the human-in-the-loop research community we draw on.) The receipt-first posture is our operational expression of that framing: if the human is in the loop, the loop should be visible on the dashboard, including the line that says how often the human disagreed.
What to ask a vendor
If a vendor reports a near-zero override rate on a non-trivial task class, ask three questions. How is override defined here — rewrite, replace, discard, or only outright rejection? What slice is this number computed over — every task, only the ones routed to humans, only the ones the human opened? And when did the number last move, in either direction, and what caused the move?
The answers tell you whether the gate is tuned, whether the human is reading, and whether anyone is watching the line at all. A vendor who cannot answer all three is not yet selling a measurable product.
The full receipt set is at /the-six-receipts-explained. The broader discipline that produced it is at /measurement-honesty-for-ai-projects. The architecture that publishes the numbers is at /transparency-architecture-overview. If you want this discipline applied to an engagement of your own, the entry point is /workshop.
