The Unified Framework
Sequential Decision Analytics provides a unified language for all sequential decision problems under uncertainty. Rather than treating reinforcement learning, stochastic programming, dynamic programming, and optimal control as separate fields, this framework unifies them through five core elements.
Five Core Elements
1. State (St)
Everything you need to make a decision at time t. In supply chain:
- Physical state (Rt) — Inventory levels, backlog, pipeline orders, available capacity
- Information state (It) — Forecasts, lead time estimates, supplier status, market signals
- Belief state (Bt) — Calibrated confidence intervals, agent confidence scores, uncertainty calibration
2. Decision (xt)
The action taken based on the current state. Examples: how much to order, whether to expedite, where to allocate scarce supply, whether to defer maintenance. Decisions can be binary, discrete, continuous, or vectors.
3. Exogenous Information (Wt+1)
New information that arrives between decisions. Demand realizations, supplier delays, quality test results, market price changes. The "styles of uncertainty" — fine-grained variability, shifts, bursts, spikes, spatial events, rare events — determine how to model and respond.
4. Transition Function (SM)
How the state evolves: St+1 = SM(St, xt, Wt+1). In supply chain: inventory = previous inventory + receipts - shipments; backlog = previous backlog + unfilled demand; pipeline = orders in transit.
5. Objective Function
What you're optimizing: minimize total cost, maximize service level, maximize expected profit subject to risk constraints. The framework emphasizes that the objective must capture the full economic impact, not proxy metrics.
Four Policy Classes
Every decision-making approach falls into one of four classes:
| Policy Class | How It Works | Autonomy Mapping |
|---|---|---|
| Direct rules | Direct state → action rules (e.g., order up to S) | Deterministic engine base-stock rules |
| Parameterized optimization | Optimize parameterized cost function | Strategic analysis computes policy parameters |
| Learned value functions | Learn state values from outcomes (Q-learning) | Execution agents (learned value functions) |
| Lookahead | Model predictive control, lookahead | Monte Carlo simulation for scenario evaluation |
Why This Matters
The framework prevents a common mistake: using a sophisticated approach where a simple one suffices, or a simple approach where sophistication is required. Base-stock rules work well for stable, high-volume items. Value-trained agents excel at high-variability, exception-heavy decisions. Parameterized optimization provides the policy parameters that guide both.
By mapping each decision type to its appropriate policy class, Autonomy uses the right tool for each job — not one monolithic model for everything.
See the framework in practice
Walk through how these policy approaches map to real planning decisions in Autonomy.