Executive summary. In enterprise-scale logistics, the line between profitability and loss is thin. We replaced legacy linear programming and univariate forecasting with a two-stage AI engine: a Temporal Fusion Transformer (TFT) for understanding volatile demand, and a Proximal Policy Optimization (PPO) agent for proactive inventory routing. This piece walks through the architecture, the challenges, and the solutions of the deployment.

22%
Reduction in expedited costs
<10%
Increase in inventory availability

1. The Business Problem: A Constrained Network

This section defines the physical reality of the supply-chain network. It visualises how thousands of nodes — hubs, regional DCs, fulfilment centres — interact. The core challenge: a localised spike in demand drains inventory, causing cascading bottlenecks back to the central hub. We have to dictate actions preemptively.

Network Dynamics Simulation

A live simulation. The large white node is the central hub; grey nodes are regional centres. Particles represent inventory flow. Demand spikes (white flashes) temporarily drain nodes, forcing the hub to react.

  • Capacity Constraint: nodes have max volumes.
  • Logistics Limits: edges have throughput caps.
  • Cascading Bottlenecks: local stockouts cause global strain.

2. Data Engineering: Establishing the “State”

An RL agent is only as good as what it observes. Legacy batch pipelines caused stale state issues. Below is the real-time streaming architecture designed to provide a unified offline/online state — and a critical challenge we hit with out-of-order data. Click the steps to explore.

High-Throughput Telemetry

Amazon Kinesis Data Streams ingests high-throughput telemetry from every network node — inventory scans, truck telematics, and queue depths at loading docks.

Changes in network topology (e.g. a trucking route closed by weather) are captured via CDC (Change Data Capture) using AWS DMS, flowing from operational databases into Kafka (Amazon MSK).

3. Core Data Science: Two-Stage Modeling

Forcing a single model to predict future demand and optimise routing was computationally impossible, so we decoupled the problem. Explore the two models below to see how we handled forecasting uncertainty and stopped the RL agent from “hacking” our logistics costs.

Temporal Fusion Transformers

Before making a plan we need the trajectory. TFTs handle heterogeneous inputs — static node types, known future promotions, and unknown real-time inventory.

Challenge: Overfitting to Viral Spikes

Point forecasting overfit to social-media trends, predicting permanent exponential growth. Solution: quantile regression. We output the 10th, 50th and 90th percentiles — if the spread is wide (high uncertainty), the RL agent learns to act conservatively.

Demand Forecast — Quantile Bands

4. AWS Architecture: Micro-Batch Serving

Because our SLA is to generate a network-wide repositioning plan every 15 minutes, we opted for an asynchronous micro-batch architecture rather than strict real-time APIs. This flow orchestrates the interaction between the Feature Store, the TFT forecaster, and the PPO agent.

Click any component above to view its operational role.


Key Takeaways

  1. Decouple prediction. Separating the forecaster from the optimiser let us debug overfitting and bad policy decisions independently.
  2. Reward shaping is critical. Enterprise RL will exploit logic loopholes — dense, multi-objective reward functions are the only path to safe production.
  3. Invest in feature stores. Complex rolling-window features require bridging Spark pipelines and inference endpoints to prevent state-consistency issues.