Azirella
← Back to Autonomy

Execution layer

AI Agents

Eleven specialized agents operate as a coordinated hive, biologically-inspired roles that communicate through Google's open Agent2Agent (A2A) protocol. Each handles one narrow decision type at machine speed, running under AIIO and surfacing to the Decision Stream only when human judgment is likely to help.

"The future of AI in the enterprise isn't a single model, it's an orchestrated system of specialized agents, each handling one task exceptionally well."

, Paul Daugherty, Chief Technology & Innovation Officer, Accenture

"Multi-agent systems are the next frontier. A single AI can optimize one function; a coordinated swarm of agents can optimize an entire supply chain simultaneously."

, Shervin Khodabandeh, Managing Director, BCG (BCG Henderson Institute, 2024)

The Agent Hive

11 specialists communicating via the A2A protocol, Google's open standard for agent interoperability. Each agent handles one decision type; the hive coordinates emergent intelligence over the shared world model.

Operational Coordinator ATP Executor Scout Order Tracking Scout PO Creation Forager Inventory Rebalancing Forager Sub- contracting Forager Inventory Buffer Nurse Forecast Adjustment Nurse Quality Disposition Guard Maintenance Scheduling Guard MO Execution Builder TO Execution Builder Scout Forager Nurse Guard Builder Signal path
11

specialized execution agents

Autonomy

< 10ms

decision latency per agent

Real-time execution

25+

cross-authority negotiation scenarios

Cross-Authority AAP

22

directed causal edges between agents

Operational Coordinator

Four-Tier Agent Hierarchy

Each tier operates at its natural time horizon and produces outputs that constrain the tier below. Information flows down as policy and directives, and back up as signals and outcomes.

Strategic

Network analysis examines supply chain topology (bottlenecks, concentration risk, fragility) and produces policy parameters that shape all downstream behavior. A site identified as a critical chokepoint gets a 1.4x safety stock multiplier.

Weekly
policy parameters

Tactical, Network Coordination

Combines network-structure signals with real-time transactional data to produce daily site directives: demand forecasts, exception probabilities, and priority allocations.

Daily

Cross-Authority Arbitration

Handles trade-offs across functional boundaries at machine speed. 25+ negotiation scenarios.

Ad Hoc
site directives

Operational, Site Coordination (per site)

Learns causal relationships between the 11 agents (22 directed edges) and predicts cascade effects before they happen. A production spike will generate quality load 2-4 hours later. Operational pre-adjusts urgency so downstream agents are ready.

Hourly
urgency adjustments ↓ ↑ signals & outcomes

Execution (Agent Hive, per site)

Specialized AI models handle one decision type each within defined authority boundaries. Coordinated via a biologically-inspired signal system.

<10ms
Scout
ATP · Order Tracking
Forager
PO · Rebalancing · Subcontracting
Nurse
Buffer · Forecast
Guard
Quality · Maintenance
Builder
MO · TO Execution

A note on the deterministic planning engine. MPS/MRP, BOM explosion, safety stock calculation, and net requirements are not a tier of the decision hierarchy. They are used to generate the training data the four tiers learn from. At runtime, decisions are made by the learned agents, not by the deterministic engine.

"The key challenge isn't building individual AI agents, it's orchestrating them. The companies that win will be the ones that master multi-agent coordination at scale."

, Nitin Mittal, US AI Leader, Deloitte (Deloitte AI Institute, "The Multi-Agent Enterprise," 2024)

Decision-First: The OODA Operating Rhythm

Autonomy is not a planning tool that generates plans. It is a decision engine that continuously observes, orients, decides, and acts across every level of the hierarchy. Each tier runs its own OODA loop at its natural cadence, and each loop's output becomes the context and guardrails for the tier below.

The OODA Loop

John Boyd's Observe-Orient-Decide-Act framework, originally developed for air combat, maps precisely to how autonomous supply chain agents operate. The side that cycles through OODA faster wins, and agents cycle in milliseconds.

OBSERVE

Ingest real-time state: inventory levels, incoming orders, supplier signals, demand patterns, quality results, capacity utilization.

ORIENT

Apply context from higher tiers: policy parameters, priority allocations, authority boundaries, likelihood thresholds. This is where guardrails shape judgment.

DECIDE

Select action within authority boundaries. The deterministic engine provides a baseline; the learned agent adjusts. Likelihood checks gate autonomy.

ACT

Execute the decision. Record the action, the reasoning, the likelihood score, and the counterfactual. Feed outcomes back up the hierarchy.

Continuous loop, every decision triggers the next observation

Nested OODA Across the Hierarchy

Each tier runs its own OODA loop. Higher tiers cycle slower but produce context that constrains the faster loops below. Lower tiers produce feedback that informs the slower loops above.

Strategic OODA Weekly

Observe: Network performance metrics, market shifts. Orient: Bottleneck analysis, risk scoring. Decide: Policy parameters (safety stock multipliers, priority weights). Act: Push parameters to all downstream tiers.

↓ Outputs: guardrails, thresholds, risk tolerances

Tactical, Network OODA Daily

Observe: Yesterday's demand, supplier status, inventory positions. Orient: Within strategic policy envelope. Decide: Priority allocations per product per site. Act: Push directives to execution agents.

↓ Outputs: priority allocations, demand forecasts, exception probabilities

Operational, Site Coordination OODA Hourly

Observe: Cross-agent signal patterns, cascade indicators. Orient: Learned causal graph between 11 agents. Decide: Urgency adjustments to pre-empt cascades. Act: Modulate agent urgency vectors.

↓ Outputs: urgency adjustments, pre-emptive signals

Execution OODA <10ms

Observe: Event trigger (order, shipment, quality hold). Orient: Within allocations, urgency, and authority from above. Decide: Narrow execution action (release, defer, rebalance). Act: Execute and record decision + outcome for learning.

↑ Outputs: decisions, outcomes, override signals, fed back up to all tiers

The Continuous Learning Loop, Every Intervention Teaches

Both human overrides and agent decisions flow back up the hierarchy as learning signals. When a planner overrides a PO quantity, causal AI records the counterfactual (what the agent would have done), tracks the actual outcome, and determines whether the override genuinely caused a better result. This is the AIIO feedback engine in motion.

Agent Learning

Execution agents retrain on decision-outcome pairs. Overrides that consistently improve outcomes increase training weight.

Policy Calibration

Override patterns reveal where guardrails are too tight or too loose. Strategic adjusts policy parameters based on aggregated feedback.

Trust Progression

As measured decision quality improves, autonomy expands. From copilot (human on the loop) to autonomous (human out of the loop).

3-5x

faster decision cycles vs. traditional planning

McKinsey

80%

of routine decisions fully autonomous within 12 months

Gartner, 2025

40%

reduction in exception handling time

Deloitte AI Institute

"The key to victory is operating at a faster tempo than the adversary. The OODA loop applied to supply chain means sensing and responding faster than disruption propagates."

, Colonel John Boyd, USAF (Adapted for supply chain context)

"Agentic AI is the next level. Right now, we have recommendations that enhance decision-making. With agentic AI, we could play that back into the systems. Agents that can place an order or transfer stock will further enhance supply chains."

, Knut Alicke, Partner, McKinsey & Company (McKinsey, "Beyond automation: How gen AI is reshaping supply chains," April 2025)

"We're building teams of agents that work with one another to complete repetitive jobs. That will come to supply chain, too."

, Asaf Somekh, Cofounder, Iguazio (McKinsey, "Beyond automation: How gen AI is reshaping supply chains," April 2025)

How a Decision Flows

Every decision follows the same path: deterministic baseline, learned adjustment, likelihood check, and outcome recording. No black boxes.

EVENT
Supplier delay, demand spike, quality hold
ENGINE
Deterministic baseline
100% auditable
AGENT
Learned adjustment
<10ms, bounded ±20%
LIKELIHOOD
High → Auto-execute
Low → Inform
LEARN
Record & improve
Continuous learning

Operational, Predictive Cross-Agent Coordination (per site)

Between daily network-level inference and sub-millisecond reactive signals lies a temporal gap. Many cross-agent interactions within a site are causal and predictable on an hourly timescale, but neither the reactive signal system nor the daily batch inference captures them.

Example: A manufacturer's production schedule spikes, 40% more manufacturing orders released than usual. This will, with high probability, generate increased quality inspection load 2-4 hours later, which may create maintenance pressure if equipment runs harder, which in turn drives purchasing activity if quality rejects increase. The signal system captures each link after it happens. Operational learns the chain as a whole and pre-adjusts urgency before the cascade unfolds.

The coordinator learns the causal relationships between all 11 agents — which agent's actions affect which other agents, and how those relationships shift depending on the current situation. Adjustments are deliberately small and additive: the coordinator shifts emphasis, never overrides decisions.

11 Execution Agents

Each agent handles one narrow decision type within its authority boundary. Organized into five functional roles following a six-phase decision cycle: SENSE, ASSESS, ACQUIRE, PROTECT, BUILD, REFLECT.

ATP Executor

Scout

Per order, <10ms

Allocated Available-to-Promise with priority consumption sequence.

Order Tracking

Scout

Per order, continuous

Exception detection with recommended actions for at-risk orders.

PO Creation

Forager

Per product-site

PO timing and quantity based on net requirements and supplier lead times.

Inventory Rebalancing

Forager

Cross-site, daily

Transfer recommendations to balance inventory across the network.

Subcontracting

Forager

Per make-vs-buy

Internal vs external manufacturing routing with split options.

Inventory Buffer

Nurse

Per product-site

Buffer parameter adjustment and reoptimization based on demand patterns.

Forecast Adjustment

Nurse

Per signal

Signal-driven forecast adjustments from email, voice, or market intelligence.

Quality Disposition

Guard

Per quality order

Accept, reject, rework, scrap, or use-as-is decisions.

Maintenance Scheduling

Guard

Per asset/work order

Preventive maintenance scheduling, deferral, and outsourcing.

MO Execution

Builder

Per production order

Manufacturing order release, sequencing, split, expedite, or defer.

TO Execution

Builder

Per transfer order

Transfer order release, consolidation, expedite, or defer.

From Zero to Autonomous in 3-5 Weeks

A six-phase digital twin pipeline takes agents from zero experience to production autonomy, no months of "let it learn" in production.

1

Individual Agent Learning

1-2 days

Each agent trains independently on curriculum-generated data. Supervised from the deterministic engine baseline.

2

Coordinated Simulation

2-3 days

All 11 agents run simultaneously with the signal system active. They learn coordination, how signals flow between roles.

3

Cross-Agent Model

~1 day

Operational trains on coordinated traces, learning the causal relationships and predicting cascade effects.

4

Stress Testing

3-5 days

Adversarial scenarios: demand spikes, supplier failures, capacity shocks. Agents that panic or freeze are retrained.

5

Copilot Calibration

2-4 weeks

Agents run in copilot mode. Every override is captured and scored. The system absorbs your team's specific judgment patterns.

6

Autonomous Operation

Continuous

Continuous learning loop: outcome collection, policy improvement, uncertainty calibration, and retraining. Agents never sleep, never take holidays, and don't break for lunch, they learn and improve 24/7.

The digital twin (simulation) feeds phases 1, 2, 4, and 6, enabling counterfactual evaluation at every stage.

Strategic Context via the Context Engine

Agents don't operate on transactional data alone. The Context Engine ingests organizational knowledge (annual reports, strategy documents, market analysis, operating models, executive directives) and delivers relevant context to agents at decision time, and then writes that context back into the shared world model so every other agent can read it.

This means a PO agent knows the board just approved a new product launch, an inventory agent knows the CEO emphasized supply resilience last quarter, and a quality agent knows brand quality was flagged as a strategic differentiator. Every agent decision is grounded in current organizational intent, not just operational state.

Likelihood & Inform

Every agent decision carries a calibrated likelihood score. Likelihood is calibrated continuously from historical decision-outcome pairs, powered by conformal prediction, a distribution-free framework that provides mathematically guaranteed coverage.

Three checks govern routing: agent likelihood below threshold (default 0.6), risk bound above threshold, or prediction interval width exceeding 50% of the value range. When any triggers, the decision is surfaced to the Decision Stream , first to an exception handler with decision memory context, then to human inspection with ranked options and trade-off analysis. Under AIIO, the action has already been taken; Inform is there to let you intervene if you know more.

Override Effectiveness Tracking

When a human overrides an agent decision, causal AI tracks what would have happened (the counterfactual) and what actually happened (the outcome). The system determines which overrides genuinely caused better outcomes, and adjusts agent training weights accordingly.

Override effectiveness is measured at two scopes: decision-local (did the override improve this specific decision?) and site-wide (did the override improve the site's aggregate balanced scorecard?). The composite score weights site-wide impact more heavily (60/40) to prevent locally-optimal but systemically-harmful overrides from inflating training weight.

Performance

Execution agent inference <10ms per decision
Full cluster decision cycle ~20ms (11 agents)
Network coordination daily inference ~15 seconds
Exception escalation ~200ms
Retraining (when triggered) ~5 min per agent
Digital twin simulation Days in minutes

See the agents in action

Watch agents handle exceptions in real-time with full explainability.