Whitepaper

How Agents Learn

Azirella Ltd. · May 2026 · Trevor Miles, Founder & CEO

A deep architectural read on how Autonomy's agents learn. The companion to the marketing overview, intended for engineers, architects, and the buyers who need to know what the substrate is doing before they sign for autonomous decisions in their supply chain.

The thesis: industrial AI does not need agents that reason like ChatGPT. It needs agents that decide like operators, learn like trained policies, and audit like aviation. Most of the substantive design choices in Autonomy follow from that framing. This whitepaper walks the design.

Contents

01. The economic frame: from prediction machines to decision machines
02. Three buckets: Decision Trace, Operating Knowledge, Learned Judgment
03. The cognitive cycle: OODA, ORPA, and where Autonomy puts learning
04. Two training tracks: execution agents and tier agents
05. Calibrated learning: RL, conformal, causal
06. The LLM in the learning loop: propose, never commit
07. The Learning Digest: what we learned this period
08. Compounding judgment and the economic frame revisited
09. References

Section 1

The economic frame: from prediction machines to decision machines

Ajay Agrawal, Joshua Gans, and Avi Goldfarb framed the economics of the last decade of AI cleanly: AI makes prediction cheap. Their AI Canvas decomposes any task into Prediction, Judgment, Action, Outcome (top row) and Input, Training, Feedback (bottom row). The economic argument is that cheap prediction increases the value of its complements, data, judgment, and the action that turns a prediction into a decision.

The framing has held into 2024. Their HBR piece "Generative AI Is Still Just a Prediction Machine" doubled down: even generative AI's writing, drawing, coding, and summarising are reframed prediction tasks. The judgment and action stay with humans. The 2022 book Power and Prediction extended the original argument to systemic transformation, but kept the same boundary: AI predicts; humans decide.

Autonomy starts where Agrawal stops. In industrial supply chains, the substrate now makes decisions cheap, not just predictions. A calibrated, traceable, inspectable, certifiable decision is no longer a high-cost artefact requiring scarce human judgment for every emission. It is something the substrate can produce at the cadence the operation demands, with the audit surface the regulator, the auditor, and the insurer require. Cheap decisions become valuable when the complements to decisions become valuable: the verifiable ledger (Decision Trace), the elicited heuristics that shape them (Operating Knowledge), and the compounding policy that learns from running them (Learned Judgment).

Cheap prediction made data the moat. Cheap decisions make the audit trail of those decisions, and the policy compiled from them, the moat. Neither can be bought; both compound only through operation.

The architectural consequence is what the rest of this whitepaper covers. Building a decision machine for industrial operations requires a different substrate than wrapping an LLM in a governance harness. It requires a trained policy that runs inside a calibrated uncertainty layer, a causal layer that estimates the counterfactuals you cannot run live, an elicitation substrate for the heuristics experienced operators carry, and a per-period rollup that makes the substrate's learning legible.

Section 2

Three buckets: Decision Trace, Operating Knowledge, Learned Judgment

Autonomy's learning loop has three distinct sources of information, named for the verb that produces each. Conflating them produces the kind of woolly framing where a single artefact, "experiential knowledge" or "operational memory" or "tacit knowledge", is asked to do work that needs three different mechanisms. The substrate keeps them separate.

Bucket	Verb	Term	What it is
A	run	Decision Trace	Per-decision audit row. Substrate emits one for every decision: Prompt, Decision, Expected outcome, Likelihood, with provenance and a hash-chain link to its predecessor.
B	elicit	Operating Knowledge	Elicited tacit assertions from human experts. Captured via LLM-driven structured interview, planner override, or direct rule entry. Typed records with provenance, confidence, lifecycle, and curator workflow.
A + B → C	learn	Learned Judgment	The substrate's compiled operating policy. Trained model weights, calibrated conformal intervals, causal estimands, active priors. What the substrate now knows from running and listening.

Why three, not one

Decision Trace and Operating Knowledge are evidence; they accumulate. Learned Judgment is the policy the substrate compiles from that evidence via calibrated mechanisms (RL retraining, conformal recalibration, causal posterior updates, curator-approved prior activation). The mechanism makes the difference observable. When a customer asks "where does the substrate's intelligence come from?" the honest answer has three components, each with its own provenance and its own commit boundary.

The terminology is deliberate. "Decision Trace" is industry-emerging vocabulary; aviation, pharma, and nuclear all use "trace" as the noun for an immutable per-event record. "Operating Knowledge" shifts the adjective from describing the person (planner, operator, expert) to describing the knowledge itself, which generalises across SCP, TMS, and DP planes without down-ranking the source or over-claiming status. "Learned Judgment" is mechanism-coded: it names how the policy got compiled (learned from evidence) rather than rhetorically claiming the substrate has earned it.

The hash-chained ledger

Decision Trace rows carry a per-tenant hash-chain link: each row's hash is computed over the previous row's hash plus the current row's canonical content. The chain is verifiable end-to-end: a manual recompute from the first row to the last reproduces every stored hash, or fails on a single tampered row. This is the property that turns the audit substrate into something a certification body can certify and a parametric insurer can underwrite. Customers do not have to trust Autonomy's word; they have a ledger their auditor can verify with public cryptography.

Section 3

The cognitive cycle: OODA, ORPA, and where Autonomy puts learning

Industrial operators run a cognitive cycle on every consequential decision. John Boyd named it OODA: Observe, Orient, Decide, Act. The cycle is the canonical decomposition of operator cognition in aviation, defence, control rooms, and industrial operations. Boyd's central insight was that Orient is the most important step, because that is where the observer's internal model gets updated by the observation. OODA is not just situational assessment; it is the loop where learning and decision-making happen together.

ORPA: Boyd's cycle, reformulated for agents

XMPro's MAGS architecture reformulates OODA for industrial AI agents as ORPA: Observe, Reflect, Plan, Act. The reformulation splits Boyd's Orient step into a dedicated Reflect slot because in an agent architecture the sense-making computation deserves its own structural location. XMPro's Reflect step draws on Park et al.'s 2023 Stanford paper, Generative Agents: Interactive Simulacra of Human Behavior, where the agent's memory stream stores observations and reflections, and reflections write back into the stream for next-cycle retrieval.

The two cycles are almost the same, but the architectural intent differs subtly. ORPA isolates Reflect so that an LLM can be slotted in as the reflection engine, producing both the next decision and an in-loop learning update. The memory stream is what makes the agent feel like it has continuity: next cycle reads back what last cycle wrote. The agent learns by reflecting.

Autonomy: the cycle stays, the learning leaves

Autonomy's agents follow a structurally similar cycle. Each agent observes canonical state, evaluates that state against its trained policy and the conformal calibration layer, plans an action within its envelope, and acts. The cycle is tight and fast: an NN forward pass takes milliseconds, the policy and the calibration are pre-trained and loaded, the action emits a Decision Trace row and writes canonical state. The cognitive cycle is structurally identical to ORPA.

The architectural distinction is where the learning lives. OODA and ORPA both bake learning into the inner loop: Boyd's Orient updates the model; ORPA's Reflect writes to the memory stream. Each cycle does decision-making and learning in the same breath. Autonomy deliberately lifts learning out of the cycle.

Step	OODA (Boyd)	ORPA (XMPro / MAGS)	Autonomy
Observe	Sensors, environment	DataStream inputs	Canonical state read; observation hooks
Orient / Reflect	Update internal model	LLM reflection writes into memory stream	Trained policy forward pass + conformal interval lookup
Plan / Decide	Decide	Plan via LLM advice within parametric guardrails	Action selected from trained policy
Act	Act	Configured Action Agents	Decision Trace row emitted; canonical state written
Learn	Embedded inside Orient	Embedded inside Reflect (LLM memory-stream loop)	Out of the loop. Parametric RL retraining + conformal recalibration + causal posterior updates + LLM-augmented hypothesis pipeline.

Why out-of-loop learning

In-loop learning has two costs. First, the learning step competes for the decision's compute and time budget. An LLM reflecting on the memory stream during a control loop is doing language-model inference where engineered policy evaluation would do. Second, the learning gets baked into whatever the model's reasoning happened to be in that single moment, with no cross-event aggregation, no calibration against realised outcomes, and no audit boundary between situations where the model was right and situations where it merely sounded right.

Out-of-loop learning gets the cross-event time and aggregation it needs to be calibrated. The RL trainer sees thousands of trajectories before updating policy weights. The conformal recalibration layer sees enough realised outcomes to compute coverage drift before adjusting intervals. The causal layer accumulates paired counterfactuals before updating estimands. None of these run during the decision; all of them run on their own cadence, supervised by the realised outcome rather than by the model's confidence in itself.

In-loop learning competes for compute with the decision itself and bakes the update into whatever the model's reasoning happened to be in that moment. Out-of-loop learning gets the cross-event time and aggregation it needs to be calibrated.

AI·IO·ML sits around the cycle, not inside it

Autonomy's operating model, AI·IO·ML, is one model in three couplets: the agent acts (Automate, Inform), the human engages (Inspect, Override), and the system improves (Measure, Learn). It is not a replacement for OODA or ORPA. The agent still runs its own cognitive cycle on every decision. AI·IO·ML is the contract that governs the interaction between the substrate and the operator: when the substrate brings a human in (Inform), when a human can pull on the substrate to ask why (Inspect), and when a human supersedes (Override), with every outcome measured and fed back as training signal. The agent's cognitive cycle is its private business; AI·IO·ML is what the operator sees and acts on, and what the system learns from.

The same AI·IO·ML contract also governs the learning loop. Parametric learning events (RL retraining, conformal recalibration, causal estimand updates) flow into the same Decision Stream surface that operational decisions use, with the same row template and the same Inspect contract. Most parametric updates Automate; high- magnitude shifts Inform; LLM-proposed CANDIDATEs always Inspect until the confidence head is calibrated. One governance model covers both decision events and learning events.

Section 4

Two training tracks: execution agents and tier agents

Autonomy uses two distinct families of agents, and they are trained differently because the work they do is different. Conflating them undersells what each track is actually doing.

Execution agents: pre-trained on a generic corpus

The narrow per-decision execution agents, Inventory Buffer, PO Creation, ATP, Forecast Baseline, Demand Sensing, Order Tracking, Rebalancing, and others, are pre-trained once on a representative synthetic corpus that the platform generates itself. No customer data is involved. When a customer goes live, the right checkpoint is drawn from the registry and warm-started for that site.

The generic corpus gives the model its decision shape; the local data teaches it the customer's context. From go-live, each execution agent walks a three-phase curriculum at its own site: behavioural cloning against the deterministic ERP-logic teacher (Phase 1, always available), supervised learning on planner overrides (Phase 2, activates around 500 expert decisions), and conservative offline RL on observed BSC outcomes (Phase 3, activates around 1000 outcome records).

Tier agents: trained on your DAG

The graph-based tier agents, Strategic L4 Policy Optimisation, Tactical L3 Domain-Model Reconciliation models, Operational L2 Node Coordinator, cannot be warm-started generically. Their value is exactly that they learn the topology, lead-times, capacities, and substitution behaviour of your network. They are trained per-customer before go-live, using a discrete event simulator that runs scenarios against the customer's actual supply chain DAG with deterministic engines modelled on ERP logic acting as the teacher.

The teacher provides the decisions; the agent provides the policy. The agent never sees the teacher's rule; it sees the rule's output across many scenarios and learns the underlying decision shape. This is the AlphaZero pattern: an agent that internalised the teacher's rule could at best reproduce the teacher. An agent that learns by watching outcomes discovers interactions the rule cannot encode. The teacher sets the floor; Phase-3 outcome optimisation lifts the agent above it.

Why this matters for the learning loop

The two tracks share one substrate but differ on cadence. Execution agents retrain daily; L3 tier agents retrain daily on the previous day's transactions; the Operational L2 Node Coordinator updates hourly; the L4 Policy Optimisation model retrains weekly on rolled-up consensus. None of these retrains run inside the decision cycle. The cadence is set per-agent-type and per-tenant; a distributor with high-velocity demand can pull execution-agent retraining to 6-hourly, a pharma manufacturer with stable regulated planning can push S&OP further out. The cadence knob is part of the configuration surface that the substrate exposes to operators, not a platform constant.

Section 5

Calibrated learning: RL, conformal, causal

Out-of-loop learning is only useful if the mechanism is calibrated. An uncalibrated learner produces a policy that drifts in directions the substrate cannot verify against realised outcomes. Three calibrated mechanisms run on independent cadences and feed Learned Judgment.

RL on the digital twin

The reinforcement-learning trainer plays trajectories on the digital twin against the BSC reward function. Conservative offline RL keeps the trained policy close to the behavioural distribution of the warm-started corpus and the deterministic teacher, with explicit penalties for actions outside the visited region. The training is supervised by the realised BSC outcome, not by the model's own confidence. Regression guard gates every candidate checkpoint: if the new model regresses on BSC utility against held-out trajectories, it is discarded; only checkpoints that match or improve get promoted.

Conformal calibration

Every prediction the substrate emits carries a calibrated P10 / P50 / P90 interval. The conformal layer holds out a calibration set of realised outcomes and computes the empirical coverage. When the observed coverage drifts more than a configured tolerance from the promised coverage (the predictor promised 90 percent intervals; realised coverage measured 78 percent), the calibration set is updated and the intervals are recomputed. The upstream policy is also flagged for retraining if coverage drift persists. The substrate does not Monte-Carlo plans by re-running the twin under noise; uncertainty is quantified by conformal intervals at inference time. This is an architectural invariant: the twin's PLAN_PRODUCTION mode raises an error if any stochasticity knob is left on during plan generation.

Causal posterior updates

Override effectiveness is the cleanest causal signal the substrate produces. When a planner overrides an agent's recommendation, the substrate captures the override and the realised outcome. A propensity-score matching service constructs counterfactual pairs, and the Bayesian posterior over override effectiveness is updated daily. The posterior is what tells the substrate whether the planner's override pattern consistently improves outcomes (the override-effectiveness estimand is positive and tightening), whether it is neutral (the estimand is centred on zero and the credible interval is wide), or whether it consistently degrades outcomes (the estimand is negative). Per-planner posteriors feed governance dashboards; per-decision-class posteriors feed envelope adjustments.

A first structural counterfactual is live in narrow scope: inventory-buffer overrides in the SCP plane are replayed forward through the digital twin with the agent's recommended buffer installed at the swap point, and the resulting service-level / inventory trajectory is what the override is scored against. Most overrides do not need the twin: analytical substitution and propensity-matched control pairs estimate the counterfactual directly from observed history. The structural twin-replay is reserved for the heavily-confounded decision classes where observational identification is weak, and even there a structural time-series counterfactual does the work where a full twin re-run would be too costly. Completing this observability-tiered coverage across decision classes is on the near-term roadmap and is one of Autonomy's four pillars.

Behavioural drift detector

A fourth mechanism watches the output distribution of each agent for distributional drift independent of accuracy. A z-score for continuous outputs, a total-variation distance for categorical outputs, evaluated every two hours. Severity levels (ADVISORY, RECOMMENDED, CRITICAL) drive the AI·IO·ML mode assignment: ADVISORY Automates as a log entry, RECOMMENDED Informs the operator, CRITICAL Inspects and triggers the guardrail escalation path. The drift detector does not retrain the policy; it surfaces evidence that the calibrated mechanisms above should be re-run.

The four mechanisms compose

The four mechanisms answer different questions. RL on the twin updates the policy itself. Conformal recalibration keeps the uncertainty layer honest. Causal posterior updates score the human overrides against the substrate's own recommendations. Behavioural drift detection surfaces the question. Together they give the substrate a calibrated learning loop where every update is supervised by a realised outcome and every artefact (policy weights, calibration set, estimands, drift events) is versioned, inspectable, and auditable.

Section 6

The LLM in the learning loop: propose, never commit

Autonomy's substrate-level rule on the LLM is short. The LLM is never on the decision axis: NN agents make decisions; the LLM produces text over decisions agents already made. An LLM output never becomes an actionable scalar, never an order quantity, never a lane assignment, never a calibrated threshold, never a policy update.

On the learning loop, the rule extends. The LLM may propose learning updates; it may not commit them. Three axes of LLM contribution to learning, one of which is forbidden.

Axis	Role	LLM permitted?	Downstream consumer
Narration	Synthesise what changed; characterise drift; narrate counterfactuals; write the Learning Digest	Yes	Humans reading the digest; executive briefing surface; Decision Stream INFORM rows
Hypothesis	Generate CANDIDATE updates: a CANDIDATE Operating-Knowledge prior, a CANDIDATE threshold adjustment, a CANDIDATE constraint suggestion	Yes	Curator with two-person rule (curator distinct from submitter), or parametric validator. Never auto-commit.
Commit	Write a policy weight, recalibrate a conformal interval, set a causal estimand, activate a prior	No	n/a

The natural enforcement

The rule does not need a separate enforcement mechanism. AI·IO·ML's mode-assignment formula reads field 4 of the Inspect contract, the calibrated likelihood, and routes zero-likelihood events to Inspect. LLM-proposed CANDIDATEs carry the LLM's self-report confidence rather than a calibrated probability; that source is tagged on the row and treated as uncalibrated. The formula routes every LLM CANDIDATE to a human curator until the day the LLM's confidence head is itself calibrated against curator accept-rate. At that point trivial CANDIDATEs can begin to Automate; until then, every LLM proposal pauses for human inspection.

Why this matters architecturally

Park et al.'s 2023 Smallville architecture, the pattern XMPro's MAGS adopts, lets the LLM reflect and write back into the agent's own memory stream. The reflection influences next-cycle behaviour automatically. That is a tight feedback loop the LLM owns end-to-end. It is also unauditable in any way that satisfies an industrial control room: an auditor asking "why did the agent change its behaviour this week" gets a memory-stream string the LLM wrote to itself.

Autonomy's pattern is different. The LLM writes a typed CANDIDATE row with provenance, confidence, and the evidence it drew on. A human curator inspects the CANDIDATE, applies the two-person rule (curator distinct from submitter), and either activates it (the prior becomes ACTIVE in the Operating Knowledge substrate, with full audit) or rejects it (the rejection is recorded as evidence for next-period proposals). The LLM is genuinely useful at hypothesis generation; it is structurally prevented from auto-committing those hypotheses to substrate state.

In MAGS, the LLM reflects and the memory stream changes. In Autonomy, the LLM proposes and a curator decides. Both architectures use the LLM in the learning loop; only one of them puts the LLM on the commit boundary.

Section 7

The Learning Digest: what we learned this period

Learned Judgment compounds invisibly without a periodic surface. The conformal layer recalibrates intervals on every realised outcome. The causal layer updates estimands daily. The drift detector flags events every two hours. The RL trainer updates policy weights on cadence. Each mechanism is calibrated and audited, but none of them by themselves answers the question an operator, a CFO, or an auditor eventually asks: what did the substrate learn this week, this month, this quarter?

The Learning Digest is the period-readable form of Learned Judgment. Each digest aggregates the period's learning events, characterises their magnitude and direction, and produces a narrative paragraph that a non-engineer can read. The structured rollup is the audit substrate; the narrative is the customer-facing legibility.

What a digest contains

Structured rollup

· Counts by kind (drift, recal, estimand update, prior activation, etc.)
· Counts by producer (drift detector, conformal, causal, EK curator, LLM synthesiser)
· Counts by AI·IO·ML status (Automated, Informed, Inspected)
· Magnitude aggregates (max conformal shift, max causal estimand delta, max drift z)
· CANDIDATE flow (open, committed this period, rejected this period)
· Top events with links into the Decision Trace ledger

Narrative

A short paragraph synthesised by the narration LLM, read-only over the structured rollup. The narrative names the most consequential shifts, the cross-event patterns the calibrated mechanisms surfaced, and the open CANDIDATEs awaiting curator review. It does not propose new CANDIDATEs, does not commit any update, and does not infer beyond what the structured rollup supports.

Cadence

Three digests run on cadence per tenant: a daily digest at 04:00 (after the daily causal matching at 02:40 and the NN orchestration at 03:00 have completed), a weekly digest on Monday 05:00 (after the S&OP cycle), and a monthly digest on the first of the month at 05:30. Each digest is idempotent: re-running for the same period upserts the narrative on the existing row, preserves the original structured rollup, and keeps the hash-chain position. The chain itself is per tenant, per config, per period kind; verifiable end-to-end the same way the Decision Trace ledger is.

The Inspect contract for learning events

Every learning event that lands on the Decision Stream carries the four Inspect-contract fields, adapted from their decision-event meaning. Prompt: what evidence triggered the event (the drift signal, the override pattern, the planner interview transcript). Decision: what the substrate updated (which policy weight, calibration interval, estimand, prior). Expected: the behaviour change the substrate predicts as a result. Likelihood: the calibrated probability the predicted change holds, with the source tagged (conformal probability for parametric updates, LLM self-report confidence for CANDIDATEs, drift z-score normalised to [0, 1] for drift events). The same Inspect surface that explains why an agent ordered three thousand cases of product X explains why the substrate recalibrated the lead-time interval on lane LAX → ATL.

Section 8

Compounding judgment and the economic frame revisited

Three things compound when a customer runs on Autonomy. The first is the Decision Trace ledger: every decision the substrate emits is hash-chained into a verifiable record. The chain cannot be replicated by a competitor; it can only be generated through operation. Three years of decision traces on your network is something you cannot buy. It is the property that turns Autonomy from a software vendor into something a parametric insurer can underwrite.

The second is the Operating Knowledge layer: the elicited tacit assertions from your planners, captured before retirements, structured with provenance, curated under the two-person rule, and surfaced to the substrate as ACTIVE priors. The knowledge that took your senior planner thirty years to internalise is no longer locked in their head; it is in the substrate, attributed to them, and applicable across every site, every cycle, every successor planner who follows them.

The third is Learned Judgment: the compiled operating policy itself. The trained model weights that have seen every season your business goes through. The calibrated conformal intervals that know your lead-time and demand distributions. The causal estimands that have scored every override your planners have made. None of these can be transplanted to another customer; all of them improve every period your operation runs. The Learning Digest makes each period's improvement legible.

Back to the economic frame

Agrawal, Gans, and Goldfarb's economic argument said cheap prediction increases the value of the complements to prediction: data, judgment, action. The Autonomy extension says cheap decisions increase the value of the complements to decisions: the verifiable ledger, the elicited heuristics, the compounding policy. A competitor on the same platform starts where you started. They do not have your decision traces. They do not have your operating knowledge. They do not have your learned judgment. The platform is the same; the trained-on-your-business specificity is not.

That is the moat. It is not the model; the model is replaceable on a six-month cycle. It is not the substrate; the substrate is licensable to every plane operator who wants it. The moat is the three-bucket compound of your operation's decisions, your planners' heuristics, and the substrate's compiled judgment. The Learning Digest is what makes the moat visible to the people responsible for valuing it.

You cannot buy three years of decision traces. You cannot buy thirty years of operating knowledge. You cannot buy a learned policy trained on every season your network has run through. Each of these compounds only through operation, and only inside a substrate built to make the compounding legible.

Section 9

References

Agrawal, A.; Gans, J.; Goldfarb, A. (2018). Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Review Press.
Agrawal, A.; Gans, J.; Goldfarb, A. (2022). Power and Prediction: The Disruptive Economics of Artificial Intelligence. Harvard Business Review Press.
Agrawal, A.; Gans, J.; Goldfarb, A. (2024). "Generative AI Is Still Just a Prediction Machine." Harvard Business Review, November 18, 2024.
Alicke, K. (2026). "The Planner Was the System." Substack.
Boyd, J. R. (1976). "Destruction and Creation." Unpublished essay. The canonical reference for the OODA cycle.
Brynjolfsson, E.; McAfee, A. (2014). The Second Machine Age. W. W. Norton. The "bounty" framing.
Goodness, E. (2026). Gartner. The "bonded autonomy" term as a refinement of bounded autonomy.
Park, J. S.; O'Brien, J. C.; Cai, C. J.; Morris, M. R.; Liang, P.; Bernstein, M. S. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." Stanford. The Smallville paper that introduces the memory-stream-and-reflection pattern XMPro's MAGS adopts.
Silver, D.; Hubert, T.; Schrittwieser, J.; et al. (2018). "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." Science 362.6419.
van Schalkwyk, P. (2026, May 19). "Why Industrial AI Agents Don't Need to 'Reason' Like ChatGPT." LinkedIn / The Digital Engineer. ORPA cycle, Cognitive Decision Loop.
van Schalkwyk, P. (2026, February 28). "The Industrial AI Moonshot Nobody Is Talking About." LinkedIn. Bonded autonomy, Decision Trace, certification flywheel.

Internal architectural rules referenced

· aiio-model.md: the per-decision sequence (Automate, Inform, Inspect, Override), the mode-assignment formula, the extension to learning events.
· llm-usage-discipline.md: LLM never on the decision axis; LLM may propose CANDIDATEs, never commit them; workload routing.
· data-lineage.md: the ProvenanceMixin contract for every data-bearing entity at a producer boundary.
· four-pillars.md: AI agents, conformal prediction, digital twin, causal AI. The 90/10 ratio of decision substrate to LLM narration.
· plane-module-invariant.md: one plane per functional domain; no transient code copies.
· guardrail-escalation.md: the deontic-vocabulary contract (O / P / F, with contrary-to-duty escalation targets) for substrate guardrails.

Read the rest of the architecture

The whitepaper sets the frame. The architectural rules behind it are public; the substrate behind those is something a sophisticated buyer needs to walk in person.

Talk to us See the marketing overview