SolutionWright Universal

Science. The Stratified Palimpsest

A world made hard on purpose.

This is the proving ground beneath the labs. A layered world the agent never sees directly. It couples through one opaque interface, earns every deeper sense by growing the organ that can read it, and is paid no reward at any point. A separate monitor re-derives the whole interface from the seed and proves, per frame, that nothing leaked. The strong claims here are the ones most able to fail in public, and you can run the check yourself.

29 opaque channelsfive discoverability layersno reward signalleak-audited per framereproducible across machines

The world at a glance

What it is, in five lines.

What it is
A layered benchmark world plus a pure active-inference agent (Elixir, zero dependencies).
Interface
29 opaque numeric channels, permuted per run. The learner sees only integer to finite-number.
Objective
Expected free energy: an epistemic term plus a pragmatic term. No reward, no reinforcement learning, no backprop.
Evidence
A no-leak monitor re-derives the interface from the seed and checks every observation. Tamper and it fails.
Reproducibility
Same seed gives an identical trace. Integers exact; floats within 1e-6 across machines.

Earned, not given

Five layers. Each one invisible until the body grows the sense.

The world is written in strata. A body starts minimal and must grow its senses in order, each one funded by energy it has to forage. A layer it cannot sense is not hidden behind a wall; it is structurally invisible. There is no master view to peek at.

  1. L0

    Contact

    What lives here. Felt only where the body is: nutrient, temperature, solvent, toxin.

    To read it. Seed senses. Present from birth.

  2. L1

    Material

    What lives here. Per-cell composition and distal gradients.

    To read it. Needs the chemotactile sense, then the plume sense, grown in order.

  3. L2

    Hidden causal

    What lives here. Internal stress and collapse risk: cavity, strain, support.

    To read it. Needs proprioception, then the tomography sense.

  4. L3

    Spectral

    What lives here. Three field-instability bands.

    To read it. Needs the plume sense, then the spectral sense.

  5. L4

    Seam topology

    What lives here. Readiness to open a new region.

    To read it. Needs the spectral sense, then the seam-coherence sense. A hard, late capability.

And the senses themselves come through one opaque interface: twenty-nine channels, permuted per run, optionally value-scrambled, so channel seven means a different thing in every world. The learner can never hard-code what a channel means. Actions are relative only; absolute coordinates are rejected at the door.

No reward, checked in code

The agent is never paid.

There is no score, no return, no fitness on any path the learner can read. The viability of the body is measured only by an evaluation harness, off to the side, and never sent back as a channel. The agent's only objective is expected free energy.

We checked this in code, not just in prose. Clone an action so its world-transitions are identical, and its policy value comes out identical to the part in a trillion. There is no per-action cost or bias anywhere in the selector to smuggle a reward through.

The objective, named in full

Expected free energy: information gain plus preference.

The agent perceives by keeping its beliefs close to the world, and acts by choosing policies that minimise expected free energy. That energy is two readable parts: how much a move would teach it, and how much a move moves it toward what it prefers. Both are computed from its own beliefs and the world's outcomes. Nothing else is in there.

Per policy, summed over factors and steps:

neg_EFE = epistemic + pragmatic

epistemic = H(qo) − E_q[ H(o | s) ]

pragmatic = qo · C

qo is the predicted outcome distribution; C is log-preferences over outcomes. The policy posterior is a softmax over expected free energy plus a learned habit prior.

Perception
Beliefs minimise variational free energy, an upper bound on surprise. The update is a softmax over a forward prior plus the log-likelihood of what was sensed.
Action
Expected free energy per policy: an epistemic (information-gain) term, H(qo) minus expected ambiguity, plus a pragmatic (preference) term, predicted outcomes against log-preferences.
Learning
Pure Dirichlet accumulation of counts. No value function, no temporal-difference update, no policy gradient.
Conservatism
Every optional feature is off by default. With them off, the agent is byte-identical to the plain engine, to better than a part in a trillion over the planning path.

The whole math core is pure Elixir with zero dependencies, compiled to native machine code, with no foreign computation layer in the inference path. When novelty_gain is zero, the engine matches the standard active-inference literature.

The Markov-blanket monitor

Proven, per frame, not to cheat.

The strongest thing here is the one most able to fail. A separate verifier rebuilds the channel map from the run's public seed alone, with no secrets, and runs four checks on every observation in the recorded log. Change a single value and the check fails and names the frame.

  1. 01

    Structural

    Every observation must be a map of integer channel to finite number, in range. Any atom, string, struct, tuple, or out-of-range key is a leak.
  2. 02

    Token scan

    A deep scan rejects forbidden world words: material classes, coordinate keys, sensor and organ names, layer and topology words. None may appear in what the agent received.
  3. 03

    Morphology provenance

    Every channel observed must trace to a sense whose organ the body actually had at that moment. No channel can exist that the recorded body could not have produced.
  4. 04

    Encode-equivalence

    The observation must be exactly the channelisation of the recorded signals, rebuilt from the seed. No hidden side-channel, proven byte for byte.

$ mix sp.verify runs/<recorded-run>.jsonl

VERIFIED: 46 frames, 0 blanket violations. # exit 0

After tampering with one observation:

VIOLATION: 1/46 frames leaked. # exit 1

A blind probe agent ran the same gauntlet: across two hundred audited observations it found zero leaks, and six hundred deliberately malformed actions were rejected. All twenty-nine channels came back structurally clean.

The honesty fences

What this does, and does not, mean.

These are not a badge. They are the edges of what the benchmark can honestly say. We carry them so the warranted claims and the over-claims stay visibly separate.

  1. 01

    Operational behavioural and organisational measures are necessary-not-sufficient substrates with ZERO evidential weight for awareness, consciousness, or life on their own. Passing a gate demonstrates the named behaviour, never experience.

  2. 02

    The UNI preprint (DOI 10.5281/zenodo.19785799) is an unrefereed working preprint. Peer review is pending.

  3. 03

    The pure core is implementation-complete but the live colony adapter is a documented bridge. We state what is demonstrated vs specified vs aspirational.

  4. 04

    No reward is a property we checked in code, not a slogan. There is no score, return, or fitness on any learner-facing path, and cloned actions get identical policy values.

  5. 05

    Words like agent, body, sense, and drive are functional descriptions of math and behaviour. They are never a claim of feeling, wanting, or understanding.

How to prove it wrong

Find a leak, or break the math.

Show an observation that carries a world word or a hidden coordinate the agent should not have seen, and the monitor missed it. Or show a recorded run that the verifier passes but should not.

Find a per-action reward hiding in the action selector, or two cloned actions whose policy values differ. Either would mean the no-reward claim is false.

Show a seed whose trace does not reproduce, or floats that drift beyond 1e-6 across machines. Every run carries its seed and its per-frame log, so anyone can reproduce it or refute it.

The paper behind this is a preprint, with expert review pending. We present it that way on purpose.