Most studios ship a thin app over a SaaS rental. We engineer a research-grade substrate underneath: formal verification, signed receipts, multi-agent supervision, recursive self-improvement. This page is the technical due-diligence room. If you’re evaluating XXI for a serious build, read on.
// When the system has to be more than software
Some projects need more than a clean UI and a database. When the work has to hold under audit, refuse to hallucinate, and get sharper week over week, the system needs an engineering layer underneath that most studios don’t reach for. Math, verification, memory, autonomy, learning — engineered into the project from day one.
— What sits at this tier —
Quantitative agents that observe, decide, and act on real business workflows without supervision. Platt-calibrated scoring, Kalman-tracked latent state, Thompson-sampled experimentation, KKT-optimised allocation.
For: revenue operations, capital allocation, dynamic pricing, fraud triage, portfolio decisions. The math is the moat.
Neuro-symbolic agents. The language model proposes; a formal solver verifies. Z3 / SMT / constraint logic wraps every consequential decision before it fires.
For: compliance evidence, contract review, authorisation logic, regulated-industry workflows. Hallucination-proof on the paths that matter.
Multi-agent supervisors with three-tier memory (working / episodic / semantic), hierarchical planning, self-critique loops, durable execution. Whole teams of specialised agents, coordinated.
For: operations that take a junior team to run today. Replace the team. Keep the institutional memory.
Agents that get better while you sleep. Eval suites as training signal, DSPy-compiled prompts, bandits over strategies, anytime-valid A/B testing. Every outcome feeds the next decision.
For: any system that runs for weeks. The system you receive on day one is a baseline. Day ninety, it has compounded.
Public eval dashboards, trace viewers, formal verification layers. Every decision the system makes is replayable, diffable, attributable. Proof the system holds, not promises.
For: SOC 2 prep, regulated industries, board-level AI risk reviews. Bring us the existing system, leave with the audit-ready version.
// Substrate · Polyglot
Polyglot compiles any reachable API — OpenAPI, GraphQL, or undocumented — into a sandboxed, audit-signed TypeScript or Python tool. Every tool ships with a 1 KB Ed25519 receipt your runtime, your auditor, and any third party can independently verify. No SDK. No daemon. No trust required.
— Three cryptographic guarantees —
A 32-byte master seed never leaves process RAM. HKDF-SHA256 derives per-blueprint and per-invocation scoped Ed25519 keypairs. Compromise of any scoped key leaks only that scope — HKDF is one-way; nothing reverses to the master.
Every synthesized module passes 29 versioned ts-morph rules (TypeScript) or 33 ast-module rules (Python). Any critical or high-severity finding refuses the compile at the firewall, before sandbox execution. The policy version and content hash are embedded in every receipt so an auditor five years from now can replay the exact ruleset that accepted it.
18 deterministic injection detectors plus 5 schema-conformance walkers run on every candidate payload at the firewall, before the synthesized code touches any network line. Cloud-metadata SSRF hosts, JWT-shaped secrets, Luhn-validated credit cards, prototype-pollution keys, and 14 others — evidence-masked when fired, versioned forever.
Mapped to GDPR · HIPAA · SOX · PCI-DSS · EU AI Act Article 12 · NIST AI RMF · SOC 2 · ISO 27001 A.8.28 in the formal STRIDE threat model document. Every receipt is the artifact your auditor asks for.
policy v1.1.0 · substrate state at this build · commit-pinned
// Currently engineering for ourselves
A small set of internal primitives we engineer against our own work first. They earn their way into a client project only after they’ve held under real load on ours. Quiet, deliberate, math-first.
Substrate
Content-addressed storage, Matryoshka-tiered vectors, behavioural transition graphs. The memory underneath every agent we ship. Open-sourced as Spine.
Verification
Z3 / SMT wraps around any consequential decision. Hallucination-proof paths for the operations that must hold under audit.
Orchestration
Specialised agents that compose into systems with role differentiation, observable traces, and economic supervision of compute allocation.
Learning
Evals as training signal, DSPy-compiled prompts, anytime-valid testing. Every outcome becomes feedback for the next decision.
Public notes on these primitives are pinned for the engineers who want depth. We do not sell the substrate. We use it.
// If you’re evaluating XXI for a serious build
If your project needs any of the depth on this page — formal verification, signed audit trails, multi-agent supervision, recursive self-improvement — book a discovery call. We’ll walk you through working systems on this substrate, not slide decks.