Azirella
← Back to Autonomy

AI Agents

A four-tier decision hierarchy coordinates strategic policy, network-wide allocation, cross-agent learning, and millisecond execution — with 11 specialized agents at every site, each explainable, overrideable, and continuously improving from outcomes. Agents never sleep, never go on holiday, and don't need lunch breaks — they handle the repetitive and mundane 24/7 so your team focuses on the decisions that truly matter.

Four-Tier Agent Hierarchy

Each layer operates at its natural time horizon and produces outputs that constrain the layer below. Information flows down as policy and directives, and back up as signals and outcomes.

Layer 4 — Strategic Consensus

Network analysis examines supply chain topology — bottlenecks, concentration risk, fragility — and produces policy parameters that shape all downstream behavior. A site identified as a critical chokepoint gets a 1.4x safety stock multiplier.

Weekly
policy parameters

Layer 2 — Network Coordination

Combines structural embeddings with real-time transactional data to produce daily site directives: demand forecasts, exception probabilities, and priority allocations.

Daily

Layer 3 — Cross-Authority

Handles trade-offs across functional boundaries at machine speed. 25+ negotiation scenarios.

Ad Hoc
site directives

Layer 1.5 — Site Cross-Agent Coordination

Learns causal relationships between the 11 agents (22 directed edges) and predicts cascade effects before they happen. A production spike will generate quality load 2-4 hours later — Layer 1.5 pre-adjusts urgency so downstream agents are ready.

Hourly
urgency adjustments ↓ ↑ signals & outcomes

Layer 1 — 11 Execution Agents

Specialized AI models handle one decision type each within defined authority boundaries. Coordinated via a biologically-inspired signal system.

<10ms
Scout
ATP · Order Tracking
Forager
PO · Rebalancing · Subcontracting
Nurse
Buffer · Forecast
Guard
Quality · Maintenance
Builder
MO · TO Execution

Decision-First: The OODA Operating Rhythm

Autonomy is not a planning tool that generates plans. It is a decision engine that continuously observes, orients, decides, and acts across every level of the hierarchy. Each layer runs its own OODA loop at its natural cadence — and each loop's output becomes the context and guardrails for the layer below.

The OODA Loop

John Boyd's Observe-Orient-Decide-Act framework, originally developed for air combat, maps precisely to how autonomous supply chain agents operate. The side that cycles through OODA faster wins — and agents cycle in milliseconds.

OBSERVE

Ingest real-time state: inventory levels, incoming orders, supplier signals, demand patterns, quality results, capacity utilization.

ORIENT

Apply context from higher layers: policy parameters, priority allocations, authority boundaries, confidence thresholds. This is where guardrails shape judgment.

DECIDE

Select action within authority boundaries. The deterministic engine provides a baseline; the learned agent adjusts. Confidence checks gate autonomy.

ACT

Execute the decision. Record the action, the reasoning, the confidence level, and the counterfactual. Feed outcomes back up the hierarchy.

Continuous loop — every decision triggers the next observation

Nested OODA Across the Hierarchy

Each layer runs its own OODA loop. Higher layers cycle slower but produce context that constrains the faster loops below. Lower layers produce feedback that informs the slower loops above.

Layer 4 — Strategic OODA Weekly

Observe: Network performance metrics, market shifts. Orient: Bottleneck analysis, risk scoring. Decide: Policy parameters (safety stock multipliers, priority weights). Act: Push parameters to all downstream layers.

↓ Outputs: guardrails, thresholds, risk tolerances

Layer 2 — Operational OODA Daily

Observe: Yesterday's demand, supplier status, inventory positions. Orient: Within strategic policy envelope. Decide: Priority allocations per product per site. Act: Push directives to execution agents.

↓ Outputs: priority allocations, demand forecasts, exception probabilities

Layer 1.5 — Coordination OODA Hourly

Observe: Cross-agent signal patterns, cascade indicators. Orient: Learned causal graph between 11 agents. Decide: Urgency adjustments to pre-empt cascades. Act: Modulate agent urgency vectors.

↓ Outputs: urgency adjustments, pre-emptive signals

Layer 1 — Execution OODA <10ms

Observe: Event trigger (order, shipment, quality hold). Orient: Within allocations, urgency, and authority from above. Decide: Narrow execution action (release, defer, rebalance). Act: Execute and record decision + outcome for learning.

↑ Outputs: decisions, outcomes, override signals — fed back up to all layers

The Reinforcement Learning Loop: Every Intervention Teaches

Both human overrides and agent decisions flow back up the hierarchy as learning signals. When a planner overrides a PO quantity, the system records the counterfactual (what the agent would have done), tracks the actual outcome, and statistically measures whether the override improved results. This is reinforcement learning in practice — agents continuously improve their policies based on real-world outcomes:

Agent Learning

Execution agents retrain on decision-outcome pairs. Overrides that consistently improve outcomes increase training weight.

Policy Calibration

Override patterns reveal where guardrails are too tight or too loose. Strategic layer adjusts policy parameters based on aggregated feedback.

Trust Progression

As measured decision quality improves, autonomy expands. From copilot (human on the loop) to autonomous (human out of the loop).

How a Decision Flows

Every decision follows the same path: deterministic baseline, learned adjustment, confidence check, and outcome recording. No black boxes.

EVENT
Supplier delay, demand spike, quality hold
ENGINE
Deterministic baseline
100% auditable
AGENT
Learned adjustment
<10ms, bounded ±20%
CONFIDENCE
High → Auto-execute
Low → Escalate
LEARN
Record & improve
Continuous learning

Layer 1.5: Predictive Cross-Agent Coordination

Between daily network-level inference and sub-millisecond reactive signals lies a temporal gap. Many cross-agent interactions within a site are causal and predictable on an hourly timescale, but neither the reactive signal system nor the daily batch inference captures them.

Example: A manufacturer's production schedule spikes — 40% more manufacturing orders released than usual. This will, with high probability, generate increased quality inspection load 2-4 hours later, which may create maintenance pressure if equipment runs harder, which in turn drives purchasing activity if quality rejects increase. The signal system captures each link after it happens. Layer 1.5 learns the chain as a whole and pre-adjusts urgency before the cascade unfolds.

The model treats the 11 agents as nodes in a directed graph connected by 22 causal edges. Dynamic graph attention makes the relevance of each neighbor depend on the current situation — the importance of the ATP-to-MO relationship depends on whether ATP is currently in shortage or surplus mode. Adjustments are deliberately small and additive: the coordinator shifts emphasis, never overrides decisions.

11 Execution Agents

Each agent handles one narrow decision type within its authority boundary. Organized into five functional roles following a six-phase decision cycle: SENSE, ASSESS, ACQUIRE, PROTECT, BUILD, REFLECT.

ATP Executor

Scout

Per order, <10ms

Allocated Available-to-Promise with priority consumption sequence.

Order Tracking

Scout

Per order, continuous

Exception detection with recommended actions for at-risk orders.

PO Creation

Forager

Per product-location

PO timing and quantity based on net requirements and supplier lead times.

Inventory Rebalancing

Forager

Cross-location, daily

Transfer recommendations to balance inventory across the network.

Subcontracting

Forager

Per make-vs-buy

Internal vs external manufacturing routing with split options.

Inventory Buffer

Nurse

Per product-location

Buffer parameter adjustment and reoptimization based on demand patterns.

Forecast Adjustment

Nurse

Per signal

Signal-driven forecast adjustments from email, voice, or market intelligence.

Quality Disposition

Guard

Per quality order

Accept, reject, rework, scrap, or use-as-is decisions.

Maintenance Scheduling

Guard

Per asset/work order

Preventive maintenance scheduling, deferral, and outsourcing.

MO Execution

Builder

Per production order

Manufacturing order release, sequencing, split, expedite, or defer.

TO Execution

Builder

Per transfer order

Transfer order release, consolidation, expedite, or defer.

From Zero to Autonomous in 3-5 Weeks

A six-phase digital twin pipeline takes agents from zero experience to production autonomy — no months of "let it learn" in production.

1

Individual Agent Learning

1-2 days

Each agent trains independently on curriculum-generated data. Supervised from the deterministic engine baseline.

2

Coordinated Simulation

2-3 days

All 11 agents run simultaneously with the signal system active. They learn coordination — how signals flow between roles.

3

Cross-Agent Model

~1 day

Layer 1.5 trains on coordinated traces, learning the causal relationships and predicting cascade effects.

4

Stress Testing

3-5 days

Adversarial scenarios: demand spikes, supplier failures, capacity shocks. Agents that panic or freeze are retrained.

5

Copilot Calibration

2-4 weeks

Agents run in copilot mode. Every override is captured and scored. The system absorbs your team's specific judgment patterns.

6

Autonomous Operation

Continuous

Continuous reinforcement learning loop: outcome collection, policy improvement, uncertainty calibration, and retraining. Agents never sleep, never take holidays, and don't break for lunch — they learn and improve 24/7, handling the repetitive and mundane so humans focus on truly impactful decisions.

The digital twin (simulation) feeds phases 1, 2, 4, and 6 — enabling counterfactual evaluation at every stage.

Confidence & Escalation

Every agent decision carries a calibrated confidence score. Confidence is calibrated continuously from historical decision-outcome pairs — not a heuristic score, but a mathematical guarantee.

Three checks govern routing: agent confidence below threshold (default 0.6), risk bound above threshold, or prediction interval width exceeding 50% of the value range. When any triggers, the decision escalates to higher reasoning — first to an exception handler with decision memory context, then to human review with ranked options and trade-off analysis.

Override Effectiveness Tracking

When a human overrides an agent decision, the system tracks what would have happened (the counterfactual) and what actually happened (the outcome). The system statistically learns which overrides actually improve outcomes — and adjusts agent training weights accordingly.

Override effectiveness is measured at two scopes: decision-local (did the override improve this specific decision?) and site-wide (did the override improve the site's aggregate balanced scorecard?). The composite score weights site-wide impact more heavily (60/40) to prevent locally-optimal but systemically-harmful overrides from inflating training weight.

Performance

Execution agent inference <10ms per decision
Full cluster decision cycle ~20ms (11 agents)
Network coordination daily inference ~15 seconds
Exception escalation ~200ms
Retraining (when triggered) ~5 min per agent
Digital twin simulation Days in minutes

See the agents in action

Watch agents handle exceptions in real-time with full explainability.