A falsifier you never check is the same shape as a promise you never keep.

That is the part of the discipline most vendors skip. Attaching a falsifier to a claim is the easy half — it costs one sentence. The hard half is running the falsifier on a schedule and publishing whatever it returned, including the answers that make us look bad. So we do the hard half, on purpose, in public.

Here is the rule. For every non-trivial claim we make on this site, and for every claim we make inside a client engagement, the falsifier has one of three running states: survived, failed, or untested. The state gets a date next to it. That date is the last time the falsifier was actually exercised against real data — not the day the claim was written, not the day the contract was signed, the day the test was run. (Class C)

Survived means we ran the test, the observation that would have killed the claim did not occur, and we have the log to show it. The claim keeps its current evidence class.

Failed means we ran the test and the disqualifying observation did occur. The claim is wrong, or it was right and is no longer right, or it was always partial and we just found the edge. In any of those cases the claim comes down — or the claim is rewritten narrower, with the failure linked from the new claim as the reason it is narrower. We do not edit the old claim in place; we mark it failed and leave it visible. The receipt of a claim retracting is itself a receipt.

Untested means the claim is still on the page but the falsifier has not been exercised inside the freshness window we promised for it. An untested claim is not a lie, but it is not load-bearing either, and we mark it so nobody quotes it as if it were. A claim that stays untested past its freshness window long enough gets downgraded automatically, with the downgrade visible.

Why bother with the failed column at all, when nobody else ships it?

Because the alternative is the industry default — claims that quietly stop being true and never get corrected, because the only person who would correct them is the one whose reputation depends on them staying up. Karl Popper drew the line in 1934 (Class E): a claim is scientific only if there is an observation that, if it occurred, would force you to give it up. The trick the field plays on itself is to attach a falsifier that nobody is running, so the giving-up step never arrives. Publishing the status flips that — the falsifier is not a rhetorical move, it is a piece of operational telemetry, and you can see when it last reported.

Two concrete examples of how this looks on a live engagement.

We claimed, in a recent client build, that the data pipeline emitted a human-readable change log on every deploy. Status: survived, last exercised the day before this post (Class C). The falsifier is mechanical — the change log file either exists at the expected path with a fresh timestamp or it does not.

We also claimed, on the same engagement, that the cost-per-decision metric would stay under a stated ceiling under normal load. Status, three weeks in: failed. The metric crossed the ceiling on a Tuesday. The claim is now rewritten to name the load regime under which the ceiling holds, with the failed run linked as the reason for the narrowing. Nobody got fired. The receipt of the narrowing is, for our money, more useful to the client than a survived run would have been, because it surfaced a real edge in their workload that they did not previously have language for.

That is the part we want prospective clients to understand before they sign anything. Working with us, you will see some failed statuses on your own engagement. That is by design. The presence of a failed status is how you know the test was real. The absence of any failed status, anywhere in a long-running engagement, is either luck or theater, and we have not been lucky often enough to bet on luck.

The posture is set out in /measurement-honesty-for-ai-projects, the mechanism — falsifier attached to each claim — is in /falsifier-on-every-claim, and the things we will not promise even when asked are listed at /standard/what-we-do-not-claim.

If you want this applied to a project of your own, bring it to the /workshop. We will write the falsifier on the same page as the claim, agree on the freshness window, and set the schedule for who runs the test and when. The first published status will appear before the first invoice.

/workshop — bring a claim, leave with a test and a schedule.
/measurement-honesty-for-ai-projects — the broader posture.
/falsifier-on-every-claim — how the test attaches.
/standard/what-we-do-not-claim — the claims we will not make.

Publishing Falsifier Status: Survived, Failed, Or Untested

Bring this into a working session.