Jim Dai, Cornell University and CUHK-Shenzhen
Stochastic processing networks (SPNs) provide high fidelity mathematical modeling for operations of many service systems such as data centers. It has been a challenge to find a scalable algorithm for approximately solving the optimal control of large-scale SPNs, particularly when they are heavily loaded. We demonstrate that a class of deep reinforcement learning algorithms known as Proximal Policy Optimization (PPO) can generate control policies for SPNs that consistently beat the performance of known state-of-arts control policies in the literature. PPO is an approximate policy iteration algorithm that can naturally be implemented in a pure data-driven fashion. In this talk, I will present both the motivating theory and the critical components of the algorithm that make PPO a success in our setting.
This talk is based on the paper written jointly with Mark Gluzman at Cornell University:
“Queueing Network Controls via Deep Reinforcement Learning”.