"Trust me" is the most expensive sentence in a vendor relationship. By the time you find out it was wrong, you have paid for the wrong, the rework, and the meeting where everyone agrees who to blame.

A receipts standard is the cheap alternative. Three failure modes we see in AI engagements, and the specific receipt that catches each on day one rather than day ninety.

Failure mode 1: the silent model swap

A vendor builds a feature against one model and ships it. A few weeks later the underlying model is upgraded, deprecated, or quietly routed to a cheaper variant. Nothing in your dashboards changes. The feature still "works" — until a user notices it now does something subtly different, or refuses something it used to accept, or hallucinates a field it used to fill correctly. You did not approve the change. You only know because a customer complained.

The receipt that catches it. A configuration entry, pinned in the ledger, that names the exact model identifier, the provider, the routing path, and the date the pin was set (Class C). Any change to that line is a ledger event with a human name attached. If the provider deprecates the pinned model, that is an entry too — written the day the notice landed, not the day the feature breaks.

This is boring discipline. It is the difference between "we shipped a feature" and "we shipped a feature you can run a year from now without surprises."

Failure mode 2: scope creep dressed as helpfulness

The second pattern is friendlier and therefore more dangerous. The vendor finishes the agreed-upon slice, has time on the clock, and starts "improving" things you did not ask for. A prompt gets reworded. A fallback gets added. A retry policy gets tweaked.

Each change looks like a favor. The sum total is a system you cannot reason about anymore, because the vendor's memory of "what we did this sprint" no longer matches the code.

The receipt that catches it. The ledger entry rule: every change to a production prompt, fallback, retry, or routing rule is a one-line entry with the date, the human, the file path, and a falsifier (Class C). If a change does not earn a ledger entry, it does not ship. If it does, you can read the diff of your own system in plain English, week over week.

Version control is necessary but not sufficient. Git tells you what changed; the ledger tells you why and what would prove the change was wrong.

Failure mode 3: refusal-rate drift

The third pattern is the quietest. An AI feature gradually starts refusing requests it used to handle — or, more rarely, accepting requests it used to refuse. The vendor did not "change" anything. The provider's safety tuning shifted. Your users see fewer answers and more deflections. Your support inbox absorbs the difference.

Most teams discover this only after a quarter of weird tickets. By then the trend is hard to recover from cleanly.

The receipt that catches it. A tracked sample of canonical requests, replayed on a regular cadence, with the refusal-rate and response-shape recorded against a baseline (Class C). When the rate drifts beyond the agreed band, the ledger flags it that week. The drift is then either accepted, in writing, or remediated, in writing. There is no third option called "we will keep an eye on it."

This is not exotic infrastructure — a small list of inputs, a scheduled replay, a column of numbers. The reason most engagements do not have it is that the numbers, once visible, make spin harder.

Why these three, in this order

These three are not arbitrary. They are the failure modes where catching the problem late costs the most relative to catching it early. The work to install each receipt is small. The cleanup without the receipt is large, slow, and politically expensive.

Alicia Maren's summary of recent Anthropic work — AI Misalignment: Anthropic's Studies and More — describes how small training-time signals can escalate into behaviors nobody signed up for. The operational lesson for buyers and operators of AI features is the same one we apply here: governance gates and recorded approvals are a baseline, not a stretch goal (Class E).

What we will not promise

A receipts standard does not prevent every failure. What it does is collapse the time between when a failure starts and when you find out — usually from a quarter to a day. That compression is most of the value, because almost every preventable mess in this market is a slow problem nobody caught early.

Trust Receipts Beat 'Trust Me': Three Failure Modes The Receipts Catch

Failure mode 1: the silent model swap

Failure mode 2: scope creep dressed as helpfulness

Failure mode 3: refusal-rate drift

Why these three, in this order

What we will not promise

Read next

Bring this into a working session.