Azirella
← Back to Learn Framework

Sequential Decision Analytics for Supply Chain

Every supply chain decision is a sequential decision under uncertainty. Powell's unified framework gives them a common language, and a way to pick the right policy class for each one.

The Unified Framework

Sequential Decision Analytics provides a unified language for all sequential decision problems under uncertainty. Rather than treating reinforcement learning, stochastic programming, dynamic programming, and optimal control as separate fields, this framework unifies them through five core elements.

"Every sequential decision problem consists of five elements: state variables, decision variables, exogenous information, the transition function, and the objective function. Once you see this structure, you realize that reinforcement learning, stochastic programming, and optimal control are all solving the same problem with different tools."

, Warren B. Powell, Professor Emeritus of Operations Research, Princeton University (Sequential Decision Analytics and Modeling, 2022)

Five Core Elements

State Sₜ Decision xₜ Exogenous Wₜ₊₁ Transition Sᴹ Objective min C / max F Sₜ₊₁ = Sᴹ(Sₜ, xₜ, Wₜ₊₁)

1. State (St)

Everything you need to make a decision at time t. In supply chain:

  • Physical state (Rt), Inventory levels, backlog, pipeline orders, available capacity
  • Information state (It), Forecasts, lead time estimates, supplier status, market signals
  • Belief state (Bt), Calibrated likelihood intervals, agent likelihood scores, uncertainty calibration

In Autonomy, that state is the one shared world model. Every agent reads from it and writes back into it.

2. Decision (xt)

The action taken based on the current state. Examples: how much to order, whether to expedite, where to allocate scarce supply, whether to defer maintenance. Decisions can be binary, discrete, continuous, or vectors.

3. Exogenous Information (Wt+1)

New information that arrives between decisions. Demand realizations, supplier delays, quality test results, market price changes. The "styles of uncertainty" (fine-grained variability, shifts, bursts, spikes, spatial events, rare events) determine how to model and respond.

4. Transition Function (SM)

How the state evolves: St+1 = SM(St, xt, Wt+1). In supply chain: inventory = previous inventory + receipts - shipments; backlog = previous backlog + unfilled demand; pipeline = orders in transit.

5. Objective Function

What you're optimizing: minimize total cost, maximize service level, maximize expected profit subject to risk constraints. The framework emphasizes that the objective must capture the full economic impact, not proxy metrics.

"The belief state is the most underappreciated element in sequential decision problems. In supply chain, your beliefs about supplier reliability, demand patterns, and capacity constraints are as important as the physical state of inventory on shelves."

, Dimitri Bertsekas, Professor of Electrical Engineering and Computer Science, MIT (Reinforcement Learning and Optimal Control, 2019)

Four Policy Classes

Every decision-making approach falls into one of four classes:

Low compute High compute Direct Rules Parameterized Opt. Learned Values Lookahead Simple Sophisticated
Policy Class How It Works Autonomy Mapping
Direct rules Direct state → action rules (e.g., order up to S) Deterministic engine base-stock rules
Parameterized optimization Optimize parameterized cost function Strategic analysis computes policy parameters
Learned value functions Learn state values from outcomes Execution agents (learned value functions)
Lookahead Simulate future scenarios before deciding Monte Carlo simulation for scenario evaluation

Why This Matters

The framework prevents a common mistake: using a sophisticated approach where a simple one suffices, or a simple approach where sophistication is required. Base-stock rules work well for stable, high-volume items. Value-trained agents excel at high-variability, exception-heavy decisions. Parameterized optimization provides the policy parameters that guide both.

By mapping each decision type to its appropriate policy class, Autonomy uses the right tool for each job, not one monolithic model for everything. The Decision Stream then carries all four classes of decision to the user through the same AIIO surface.

"The most common mistake in practice is to default to one policy class for everything. Simple (s, S) policies work brilliantly for 80% of SKUs. The other 20% need something more sophisticated. The framework tells you which is which."

, Warren B. Powell, Professor Emeritus of Operations Research, Princeton University (Reinforcement Learning and Stochastic Optimization, 2011)
4

Universal policy classes that cover every sequential decision approach

Powell, Sequential Decision Analytics, 2022

15+

Previously siloed fields unified under a single modeling framework

INFORMS Review, 2021

80%

Of supply chain decisions can be handled by simple direct-rule policies

McKinsey Operations Practice, 2023

3-5x

Improvement in exception handling when matching policy class to decision complexity

Forrester Research, 2022

See the framework in practice

Walk through how these policy classes map to real planning decisions across six domains.