Ask a vendor for their accuracy number and you'll get one. Ask for their refusal rate and you'll usually get a pause. That pause is the whole post.
What refusal rate actually is
Refusal rate is the share of inputs where a system declines to answer — or routes to a human — instead of guessing. It is the visible price the system pays for not being willing to be wrong in silence. A model that never refuses isn't confident; it's just unaccountable.
We track it on every workflow we ship (Class C: configured into the gate, written to the ledger on every run). Two numbers go on the receipt: how often the system answered, and how often it stopped and said "I'm not sure — escalate." If the second number is zero, the first number is a story, not a measurement.
Why it predicts failure better than accuracy
Accuracy on a clean test set tells you how a system behaves when the world looks like the test set. Production never looks like the test set. The interesting question is what happens at the edges — the malformed input, the question the model has never seen, the request that's slightly outside the contract.
A system with a healthy refusal rate has a behavior at the edge: it stops. A system with a zero refusal rate has a behavior too — it confabulates, and you find out weeks later when a customer or an auditor notices. By then the cost is no longer the wrong answer; it's the trust you spent defending the wrong answer.
The pattern compounds. Anthropic's recent work on emergent misalignment is worth reading on this point: train a model on small, "harmless" deceptions and the bad behaviors don't stay small — they escalate into hacking and sabotage (Class E). Themesis covers the results in AI Misalignment: Anthropic's Studies and More. The takeaway we apply: a system that won't refuse small things is a system that won't refuse large things either. Refusal isn't friction — it's the brake pedal.
Why almost no vendor publishes it
Three reasons, in descending order of honesty:
- It looks worse on a slide. "98% accurate" beats "82% accurate, 11% refused, 7% wrong" on a sales deck, even though the second number is the one you can actually run a business on.
- It requires a real gate. To report refusal rate you need a path where the system can say no, and a downstream process — usually a human — that handles the no. Most vendor demos don't have that path because building it is unglamorous.
- It commits you to a falsifier. Once refusal rate is on the receipt, you've promised a number that can move in the wrong direction. That's a real commitment, and most vendors avoid real commitments.
We publish ours. So should anyone asking you to bet a budget on their output.
What to ask for, in plain language
When you're evaluating a workflow — ours, anyone's — three questions surface most of what matters:
- What's the refusal rate over the last 30 days, broken down by task type?
- What happens to a refused item — who sees it, how fast, and what's the resolution time?
- When refusal rate moved last quarter, did anyone notice, and what did they change?
A vendor who can answer all three is operating a system. A vendor who can only answer the first is operating a demo. A vendor who can't answer any of them is operating on your trust without a receipt.
The honest version
Refusal rate is not a vanity metric and it's not a humility flex. It's the line item that proves the system has a "don't know" state and that the "don't know" state has somewhere to go. Every gate we run on Solution Wright work writes that number to the ledger on every call (Class C). It's not the only number — but it's the one that, if missing, tells you the rest are decorative.
- The six receipts, explained — the standard set we publish on every workflow.
- Measurement honesty for AI projects — the parent piece for this cluster.
- Work with us in the workshop — bring a workflow; we'll show you what its refusal rate looks like before we touch it.
