A clean engagement does not start with a wall of confirmed facts. It starts with a wall of guesses, and then, week by week, the guesses get replaced. What you want to watch is the shape of that replacement.

The healthy curve, in one paragraph

In week one, most of what we are saying about your situation is unverified (U). We have not measured anything in your environment yet; we are running on what you told us and what the existing artefacts imply. By week four, the mix should have shifted: a meaningful slice of U should have upgraded to A (we measured it), B (we read the code), or C (we wired it and the wiring is real today), or — honestly — dropped because we could not stand the claim up. By week twelve, the dashboard should be mostly A and C, with a thin band of E for outside work, an F tag on every load-bearing claim, and a very small residual U we have not yet been able to retire. (Class C — this is the shape we configure the engagement to produce.)

If the mix never moves, something is wrong. If it moves the wrong way — more U at week eight than at week four — something is more wrong.

Why each class is supposed to behave a particular way

The six tags do not all trend the same direction over time, and confusing them is the easiest way to mis-read a dashboard.

A (empirical, in session) should grow. Every week we run the system in your environment, more things get observed. If A is flat across weeks four through eight, we are not actually running the work.
B (code or inspection) should grow early and then plateau. Most of the code-level claims are made in the first month while we are reading the build. After that, B grows only when new code lands.
C (configuration or integration) should grow as the integration surfaces are wired up, and then stay stable as long as the wiring stays stable. A sudden drop in C usually means a credential rotated, a webhook URL moved, or a route was removed and nobody updated the page.
E (expert citation) should be present from week one and grow slowly. We cite outside work where it earns the citation. A citation-heavy dashboard with no A is a tell — somebody is hiding behind other people's papers.
F (falsifier present) should track A and C. Every load-bearing claim is supposed to ship with the test that would prove it wrong. F is not a separate body of work; it is a tax on the other classes.
U (unverified) should shrink. The whole point of the engagement is to retire U claims, by promoting the ones that turn out true and dropping the ones that do not. (Class C.)

That, in six bullets, is what we mean when we say the mix tells you the story.

Three failure shapes we have learned to spot

The mix can break in characteristic ways. We have seen each of these enough times to give them names.

The frozen-U dashboard. U does not shrink. Weeks roll past and the same claims sit unverified. This is usually not laziness — it is something more boring: nobody scheduled the measurement that would upgrade them. The fix is to put a date on every U on the page. If a date cannot be agreed, the claim probably should not be on the page.

The all-A illusion. A grows fast, U falls fast, and the engagement still feels stuck. Look at C. If C is not growing alongside A, the A claims are about runs we did once and never wired into anything durable. Empirical-once is not integrated. The dashboard looks great and the work does not compound.

The citation drift. E balloons. Every paragraph becomes a reference to outside work. This is the shape of a vendor — or, sometimes, of us on a bad week — reaching for borrowed credibility. Citation is not endorsement and is not a substitute for receipts. Ask, for every new E, what A or C it is paired with.

(Class C — these patterns are surfaced by the way we configure the dashboards.)

What we do when the mix is wrong

When we see a failure shape, the response is small and mechanical. We pull the affected claims off the public dashboard, mark them U on the internal one, and put a date next to each on a "retire-or-drop" list. The same six receipts that monitor the engagement also monitor the dashboard itself. The honesty discipline applies recursively or it does not apply.

There is a longer treatment of what each tag actually means at /what-an-evidence-class-actually-means, and the broader six-receipt framing — of which the mix is the sixth — at /measurement-honesty-for-ai-projects. If you would like to apply this rhythm to your own work rather than just read about it, the place that happens is /workshop.

The mix is dull on the day you draw it. It becomes informative in week four, when you compare it to week one and see whether the line is moving.

See the six tags one by one: /what-an-evidence-class-actually-means
See the full six-receipt frame: /measurement-honesty-for-ai-projects
Bring this rhythm into your work: /workshop

Evidence-Class Mix Over Time: What A Healthy Engagement Looks Like

The healthy curve, in one paragraph

Why each class is supposed to behave a particular way

Three failure shapes we have learned to spot

What we do when the mix is wrong

Bring this into a working session.