Why it is verifiable
- Falsifiable claims — a test could refute them
- Separation of powers — the proposer never certifies
- Ground truth — evidence resolves on disk
- Reproducible — public, hash-chained registry
Most AI output cannot be defended after the demo. DeepSCR fixes that at the protocol level — by refusing to let any single agent both propose and certify its own work.
The reliability problem in applied AI is not mainly a model problem. A capable model can still produce a deliverable nobody can verify: a recommendation with no falsifiable claim, an architecture with no failure analysis, a result whose "evidence" is the model's own fluency. The gap is governance — who proposes, who attacks, who verifies, who certifies — and whether those powers are separated.
DeepSCR is a governance protocol built on that separation. It decomposes any engagement into four adversarial steps. S — Hypothesis: state a claim that a test could falsify. C — Contradiction: attack it; surface at least three failure modes before shipping. V — Verification: tie every claim to ground truth — a real file, line, or test that resolves on disk. R — Certification: compute an independent score, with the power to veto. The proposer never certifies. That single rule is what makes the output auditable rather than merely persuasive.
DeepSCR is deliberately domain-agnostic — it could govern any kind of work. The engineering depth comes from the method it governs: Forward Deployed Engineering (FDE), the field discipline for taking a business problem to owned, production software. FDE supplies the domain steps — reconnaissance of the real codebase, a six-question decomposition, candidate architectures behind a held-out gate, a production handoff — while DeepSCR supplies the checks. The composition is hierarchical, and the order matters: it is DeepSCR governing FDE specialists, not FDE applying DeepSCR.
Concretely, an engagement runs as eight agents. Four are the DeepSCR powers — a Lead that decomposes and claims, Researchers that falsify hypotheses in parallel against a held-out gate, a Builder that assembles the deliverable, and a Certifier that scores it independently and can veto the Lead's optimism. Four are FDE specialists the Builder consults — Scoping, Architecture, Agent-engineering, and Production — each grounded in a documented body of practice. The specialists do the domain work; the powers govern it.
The output is a single number with teeth: the FDE Assurance Score (0–100), weighted across four components — a present, falsifiable claim (25), genuine contradiction with three or more failure modes (25), an evidence trail that resolves on disk (30), and clean anti-patterns (20). Evidence is the heaviest weight, and it is the one that cannot be faked: synthetic structure earns nothing if the cited artifacts do not exist. Below 85, the Certifier vetoes and the loop re-routes the work to whichever agent owns the weakest component.
What removes the human from the step-by-step loop without removing accountability is a self-prompting layer. After every result, five judgments are applied — evidence, telos, risk, coherence, value — and the verdicts deterministically generate the next prompt and route it to the agent best able to act on it. The operator stays informed throughout and is gated only on genuinely irreversible actions, such as a production-handoff decision.
The reason this holds up as science, not marketing, is four properties it enforces: claims are falsifiable; no agent self-certifies; evidence must resolve on ground truth; and every certification is written to a public, hash-chained registry that anyone can re-verify. Falsifiability, separation, ground truth, reproducibility — the same properties that distinguish a result from an assertion anywhere else.
The full flow, worked on a real problem with an animated data-flow, lives on the Protocols page.