Execution layer
AI Agents
Eleven specialized agents operate as a coordinated hive, biologically-inspired roles that communicate through Google's open Agent2Agent (A2A) protocol. Each handles one narrow decision type at machine speed, running under AIIO and surfacing to the Decision Stream only when human judgment is likely to help.
"The future of AI in the enterprise isn't a single model, it's an orchestrated system of specialized agents, each handling one task exceptionally well."
"Multi-agent systems are the next frontier. A single AI can optimize one function; a coordinated swarm of agents can optimize an entire supply chain simultaneously."
The Agent Hive
11 specialists communicating via the A2A protocol, Google's open standard for agent interoperability. Each agent handles one decision type; the hive coordinates emergent intelligence over the shared world model.
specialized execution agents
Autonomy
decision latency per agent
Real-time execution
cross-authority negotiation scenarios
Cross-Authority AAP
directed causal edges between agents
Operational Coordinator
Four-Tier Agent Hierarchy
Each tier operates at its natural time horizon and produces outputs that constrain the tier below. Information flows down as policy and directives, and back up as signals and outcomes.
Strategic
Network analysis examines supply chain topology (bottlenecks, concentration risk, fragility) and produces policy parameters that shape all downstream behavior. A site identified as a critical chokepoint gets a 1.4x safety stock multiplier.
Tactical, Network Coordination
Combines network-structure signals with real-time transactional data to produce daily site directives: demand forecasts, exception probabilities, and priority allocations.
Cross-Authority Arbitration
Handles trade-offs across functional boundaries at machine speed. 25+ negotiation scenarios.
Operational, Site Coordination (per site)
Learns causal relationships between the 11 agents (22 directed edges) and predicts cascade effects before they happen. A production spike will generate quality load 2-4 hours later. Operational pre-adjusts urgency so downstream agents are ready.
Execution (Agent Hive, per site)
Specialized AI models handle one decision type each within defined authority boundaries. Coordinated via a biologically-inspired signal system.
A note on the deterministic planning engine. MPS/MRP, BOM explosion, safety stock calculation, and net requirements are not a tier of the decision hierarchy. They are used to generate the training data the four tiers learn from. At runtime, decisions are made by the learned agents, not by the deterministic engine.
"The key challenge isn't building individual AI agents, it's orchestrating them. The companies that win will be the ones that master multi-agent coordination at scale."
Decision-First: The OODA Operating Rhythm
Autonomy is not a planning tool that generates plans. It is a decision engine that continuously observes, orients, decides, and acts across every level of the hierarchy. Each tier runs its own OODA loop at its natural cadence, and each loop's output becomes the context and guardrails for the tier below.
The OODA Loop
John Boyd's Observe-Orient-Decide-Act framework, originally developed for air combat, maps precisely to how autonomous supply chain agents operate. The side that cycles through OODA faster wins, and agents cycle in milliseconds.
Ingest real-time state: inventory levels, incoming orders, supplier signals, demand patterns, quality results, capacity utilization.
Apply context from higher tiers: policy parameters, priority allocations, authority boundaries, likelihood thresholds. This is where guardrails shape judgment.
Select action within authority boundaries. The deterministic engine provides a baseline; the learned agent adjusts. Likelihood checks gate autonomy.
Execute the decision. Record the action, the reasoning, the likelihood score, and the counterfactual. Feed outcomes back up the hierarchy.
Nested OODA Across the Hierarchy
Each tier runs its own OODA loop. Higher tiers cycle slower but produce context that constrains the faster loops below. Lower tiers produce feedback that informs the slower loops above.
Observe: Network performance metrics, market shifts. Orient: Bottleneck analysis, risk scoring. Decide: Policy parameters (safety stock multipliers, priority weights). Act: Push parameters to all downstream tiers.
↓ Outputs: guardrails, thresholds, risk tolerances
Observe: Yesterday's demand, supplier status, inventory positions. Orient: Within strategic policy envelope. Decide: Priority allocations per product per site. Act: Push directives to execution agents.
↓ Outputs: priority allocations, demand forecasts, exception probabilities
Observe: Cross-agent signal patterns, cascade indicators. Orient: Learned causal graph between 11 agents. Decide: Urgency adjustments to pre-empt cascades. Act: Modulate agent urgency vectors.
↓ Outputs: urgency adjustments, pre-emptive signals
Observe: Event trigger (order, shipment, quality hold). Orient: Within allocations, urgency, and authority from above. Decide: Narrow execution action (release, defer, rebalance). Act: Execute and record decision + outcome for learning.
↑ Outputs: decisions, outcomes, override signals, fed back up to all tiers
The Continuous Learning Loop, Every Intervention Teaches
Both human overrides and agent decisions flow back up the hierarchy as learning signals. When a planner overrides a PO quantity, causal AI records the counterfactual (what the agent would have done), tracks the actual outcome, and determines whether the override genuinely caused a better result. This is the AIIO feedback engine in motion.
Execution agents retrain on decision-outcome pairs. Overrides that consistently improve outcomes increase training weight.
Override patterns reveal where guardrails are too tight or too loose. Strategic adjusts policy parameters based on aggregated feedback.
As measured decision quality improves, autonomy expands. From copilot (human on the loop) to autonomous (human out of the loop).
faster decision cycles vs. traditional planning
McKinsey
of routine decisions fully autonomous within 12 months
Gartner, 2025
reduction in exception handling time
Deloitte AI Institute
"The key to victory is operating at a faster tempo than the adversary. The OODA loop applied to supply chain means sensing and responding faster than disruption propagates."
"Agentic AI is the next level. Right now, we have recommendations that enhance decision-making. With agentic AI, we could play that back into the systems. Agents that can place an order or transfer stock will further enhance supply chains."
"We're building teams of agents that work with one another to complete repetitive jobs. That will come to supply chain, too."
How a Decision Flows
Every decision follows the same path: deterministic baseline, learned adjustment, likelihood check, and outcome recording. No black boxes.
Low → Inform
Operational, Predictive Cross-Agent Coordination (per site)
Between daily network-level inference and sub-millisecond reactive signals lies a temporal gap. Many cross-agent interactions within a site are causal and predictable on an hourly timescale, but neither the reactive signal system nor the daily batch inference captures them.
Example: A manufacturer's production schedule spikes, 40% more manufacturing orders released than usual. This will, with high probability, generate increased quality inspection load 2-4 hours later, which may create maintenance pressure if equipment runs harder, which in turn drives purchasing activity if quality rejects increase. The signal system captures each link after it happens. Operational learns the chain as a whole and pre-adjusts urgency before the cascade unfolds.
The coordinator learns the causal relationships between all 11 agents — which agent's actions affect which other agents, and how those relationships shift depending on the current situation. Adjustments are deliberately small and additive: the coordinator shifts emphasis, never overrides decisions.
11 Execution Agents
Each agent handles one narrow decision type within its authority boundary. Organized into five functional roles following a six-phase decision cycle: SENSE, ASSESS, ACQUIRE, PROTECT, BUILD, REFLECT.
ATP Executor
ScoutPer order, <10ms
Allocated Available-to-Promise with priority consumption sequence.
Order Tracking
ScoutPer order, continuous
Exception detection with recommended actions for at-risk orders.
PO Creation
ForagerPer product-site
PO timing and quantity based on net requirements and supplier lead times.
Inventory Rebalancing
ForagerCross-site, daily
Transfer recommendations to balance inventory across the network.
Subcontracting
ForagerPer make-vs-buy
Internal vs external manufacturing routing with split options.
Inventory Buffer
NursePer product-site
Buffer parameter adjustment and reoptimization based on demand patterns.
Forecast Adjustment
NursePer signal
Signal-driven forecast adjustments from email, voice, or market intelligence.
Quality Disposition
GuardPer quality order
Accept, reject, rework, scrap, or use-as-is decisions.
Maintenance Scheduling
GuardPer asset/work order
Preventive maintenance scheduling, deferral, and outsourcing.
MO Execution
BuilderPer production order
Manufacturing order release, sequencing, split, expedite, or defer.
TO Execution
BuilderPer transfer order
Transfer order release, consolidation, expedite, or defer.
From Zero to Autonomous in 3-5 Weeks
A six-phase digital twin pipeline takes agents from zero experience to production autonomy, no months of "let it learn" in production.
Individual Agent Learning
1-2 days
Each agent trains independently on curriculum-generated data. Supervised from the deterministic engine baseline.
Coordinated Simulation
2-3 days
All 11 agents run simultaneously with the signal system active. They learn coordination, how signals flow between roles.
Cross-Agent Model
~1 day
Operational trains on coordinated traces, learning the causal relationships and predicting cascade effects.
Stress Testing
3-5 days
Adversarial scenarios: demand spikes, supplier failures, capacity shocks. Agents that panic or freeze are retrained.
Copilot Calibration
2-4 weeks
Agents run in copilot mode. Every override is captured and scored. The system absorbs your team's specific judgment patterns.
Autonomous Operation
Continuous
Continuous learning loop: outcome collection, policy improvement, uncertainty calibration, and retraining. Agents never sleep, never take holidays, and don't break for lunch, they learn and improve 24/7.
The digital twin (simulation) feeds phases 1, 2, 4, and 6, enabling counterfactual evaluation at every stage.
Strategic Context via the Context Engine
Agents don't operate on transactional data alone. The Context Engine ingests organizational knowledge (annual reports, strategy documents, market analysis, operating models, executive directives) and delivers relevant context to agents at decision time, and then writes that context back into the shared world model so every other agent can read it.
This means a PO agent knows the board just approved a new product launch, an inventory agent knows the CEO emphasized supply resilience last quarter, and a quality agent knows brand quality was flagged as a strategic differentiator. Every agent decision is grounded in current organizational intent, not just operational state.
Likelihood & Inform
Every agent decision carries a calibrated likelihood score. Likelihood is calibrated continuously from historical decision-outcome pairs, powered by conformal prediction, a distribution-free framework that provides mathematically guaranteed coverage.
Three checks govern routing: agent likelihood below threshold (default 0.6), risk bound above threshold, or prediction interval width exceeding 50% of the value range. When any triggers, the decision is surfaced to the Decision Stream , first to an exception handler with decision memory context, then to human inspection with ranked options and trade-off analysis. Under AIIO, the action has already been taken; Inform is there to let you intervene if you know more.
Override Effectiveness Tracking
When a human overrides an agent decision, causal AI tracks what would have happened (the counterfactual) and what actually happened (the outcome). The system determines which overrides genuinely caused better outcomes, and adjusts agent training weights accordingly.
Override effectiveness is measured at two scopes: decision-local (did the override improve this specific decision?) and site-wide (did the override improve the site's aggregate balanced scorecard?). The composite score weights site-wide impact more heavily (60/40) to prevent locally-optimal but systemically-harmful overrides from inflating training weight.
Performance
| Execution agent inference | <10ms per decision |
| Full cluster decision cycle | ~20ms (11 agents) |
| Network coordination daily inference | ~15 seconds |
| Exception escalation | ~200ms |
| Retraining (when triggered) | ~5 min per agent |
| Digital twin simulation | Days in minutes |
See the agents in action
Watch agents handle exceptions in real-time with full explainability.