Hard to Vary

The first health AI built on good explanations.³

Most healthcare AI systems optimize for prediction accuracy. We optimize for explanation quality. The difference: predictions can be right for wrong reasons and silently fail when conditions change. Explanations that are hard to vary — where every component is load-bearing — remain correctable and auditable even when wrong.¹

The standard is Karl Popper's. The criterion is David Deutsch's refinement. The architecture below operationalizes both. See the mechanisms →

Figure 1.1

Good explanations — easy to vary vs hard to vary

Both explanations account for the seasons. Only one of them constrains. The whole architecture turns on which kind of explanation a clinical AI system is allowed to make.¹

“The quest for good explanations is, I believe, the basic regulating principle not only of science, but of the Enlightenment generally.”

David Deutsch — The Beginning of Infinity

Engineering facts

What the architecture commits to.

Mechanisms

Distinct epistemic mechanisms

ArgMed debate, hard-to-vary scoring, falsification criteria, IDK protocol, safety routing, clinician feedback loop, composable domains, rules-as-data, accuracy ascertainment.

Verdicts

Discrete supervision outputs

APPROVE, ROUTE, HARD_STOP, REQUEST_MORE_INFO. Bounded outputs make every verdict testable; ambiguity is engineered out by construction.

HTV score

0.0–1.0

Hard-to-vary range

Four-dimensional algorithm — interdependence, specificity, non-ad-hocness, falsifiability — scored per claim. Low-HTV explanations route to a clinician.²

The cycle

Conjecture, refutation, error correction.

Popper's contribution to the philosophy of science was the insight that a theory's value lies not in how often it is confirmed but in how readily it could be falsified. We apply this standard to clinical AI as a five-stage operational loop.

01

Observe

Clinical observation enters as structured evidence — typed, hashed, provenance-tagged. No free-text intermediation between source and reasoning.
02

Conjecture

The reasoning agent (Deutsch) proposes specific, falsifiable explanations — not probability distributions over possibilities. Each conjecture must specify what would prove it wrong.
03

Critique

Multiple agents attack the conjecture adversarially. The hard-to-vary score is computed across interdependence, specificity, non-ad-hocness, and falsifiability.
04

Survive

Surviving conjectures pass to the supervision agent (Popper), structurally independent of the reasoner. Same input, same verdict — deterministic.
05

Verdict

APPROVE, ROUTE, HARD_STOP, REQUEST_MORE_INFO. Each verdict carries its evidence hash, its falsification criteria, and the override path back into step 02.

Mechanisms

Nine mechanisms compose the architecture.

01 · ArgMed Debate

Multi-agent generator → verifier → reasoner pipeline. Generate hypotheses, attack each adversarially, keep only the survivors.

02 · HTV Scoring

Quantify how hard to vary each explanation is on a 0.0–1.0 scale across interdependence, specificity, non-ad-hocness, and falsifiability.

03 · IDK Protocol

Twelve specific uncertainty triggers with structured responses. The system admits what it does not know, in a form a clinician can act on.

04 · Falsification Criteria

Every claim specifies what would prove it wrong. Claims without falsification criteria are rejected at the conjecture stage.

05 · Safety Routing

High-risk decisions and low-HTV verdicts route to a human clinician. The default is approval; the surface area for automation is bounded by epistemic confidence.

06 · Clinician Feedback Loop

Overrides actively change future reasoning for that patient. Override tracking carries confidence decay over time.

07 · Composable Domains

Medication, nutrition, exercise, sleep, mental health — the same epistemic principles compose across all clinical domains.

08 · Rules as Data

Interaction rules are explicit, versioned, auditable data. The deterministic policy engine enforces safety boundaries; rule packs declare them.

09 · Accuracy Ascertainment

We measure our own predictions against outcomes. Reproducibility is the precondition for the loop in step 06 to compound.

The thesis

Persephone explains nothing. Earth's 23.5-degree axial tilt explains everything — and the difference is the architecture.

Both stories account for the seasons. The first one is easy to vary: substitute any gods, any emotions, any relations — the explanation still "works." The second one is hard to vary: change the tilt angle and the predictions break, change the orbital plane and the hemispheres flip, change the seasons and the geometry no longer accounts for them.

Most healthcare AI today is Persephone. It produces answers that sound plausible because the underlying causal story can be rewritten without affecting the recommendation. We chose the harder constraint: every architectural commitment in this section is what it takes to ship Earth's-tilt-grade explanations.

Figure 3.1 — Hard-to-vary scoring, four dimensions

Editorial marginalia ring chart titled HARD-TO-VARY SCORING, segmented into four quadrants — interdependence, specificity, non-ad-hocness, falsifiability — with tabular figures along each axis and a center score of 0.87. Hairline navy linework on cream paper, single muted-teal accent.

Editorial pipeline diagram titled CONJECTURE-REFUTATION CYCLE, showing five sequential boxes — observe, conjecture, critique, survive, verdict — with hairline arrows and a muted-teal arc looping verdict back to conjecture. Hairline navy linework on cream paper. — Every clinical observation enters as a candidate conjecture; the rule pack is the body of refutations the conjecture must survive. The verdict that lands is what survived. The override arc carries new evidence back into step 02 — error correction is the architecture, not a feature.

Why this matters

Three audiences, the same epistemic commitment.

For patients

Your clinician stays in control — medication changes are always reviewed and approved. Explanations come with reasons. The system tells you when it does not know. Decisions are grounded in your specific situation, not generic advice.

For clinicians

The AI proposes start / stop / titrate / hold; you decide. Audit trails carry epistemological metadata for every decision. Low confidence triggers routing to you. Your overrides actively change future recommendations.

For AI safety

Error correction over error prevention. Fallibilism over certainty. Explanation over prediction. The architecture refuses to accept a verdict it cannot justify and refuses to silence a correction it cannot integrate.¹

Epistemology

How we know what we know — the Popperian foundation, the four supervision verdicts, and the structural independence between reasoning and supervision.

Read the approach →

Trust

Security architecture, deterministic evaluation, HIPAA posture, and open standards — the engineering decisions that make every claim above auditable.

See the trust architecture →

About

An infrastructure company, not a SaaS dashboard. Self-funded, two-org structure, clinical systems deployed at medical centers in Central Asia.

Read about Regain →

See the conjecture-refutation loop in your domain.

We will trace any clinical claim from the source observation to the surviving conjecture to the supervision verdict, and answer the epistemological question against the codebase, not the brochure.

Request a demo

Footnotes

Deutsch, D. The Beginning of Infinity: Explanations That Transform the World. Allen Lane, 2011. The hard-to-vary criterion is introduced in Chapter 1, "The Reach of Explanations"; the seasons example is on pp. 1–3. Chapter 7, "Artificial Creativity," is the AI-safety reading we draw on directly.
Popper, K. R. The Logic of Scientific Discovery. Hutchinson, 1959. The falsifiability criterion is developed in §I and §IV; the asymmetry between confirmation and refutation is the load-bearing argument of the book and is what the supervision agent in step 04 enforces.
"Good explanation" used here in David Deutsch's sense — hard to vary while still accounting for the phenomenon. Operationalized in our supervision protocol as the 4-dimensional rubric scored 0.0–1.0 (interdependence, specificity, non-ad-hocness, falsifiability). To our knowledge, no other deployed clinical AI system encodes this criterion as runtime supervision; if you find one, please write to anton@regain.ai.