ABSTRACT
We view randomization through the lens of statistical machine learning: as a powerful resource for offline optimization. Cloud systems make randomized decisions all the time (e.g., in load balancing), yet this randomness is rarely used for optimization after-the-fact. By casting system decisions in the framework of reinforcement learning, we show how to collect data from existing systems, without modifying them, to evaluate new policies, without deploying them. Our methodology, called harvesting randomness, has the potential to accurately estimate a policy's performance without the risk or cost of deploying it on live traffic. We quantify this optimization power and apply it to a real machine health scenario in Azure Compute. We also apply it to two prototyped scenarios, for load balancing (Nginx) and caching (Redis), with much less success, and use them to identify the systems and machine learning challenges to achieving our goal.
Our long-term agenda is to harvest the randomness in distributed systems to develop non-invasive and efficient techniques for optimizing them. Like CPU cycles and bandwidth, we view randomness as a valuable resource being wasted by the cloud, and we seek to remedy this.
Supplemental Material
- Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Alex Slivkins. 2017. The Power of Offline Evaluation in Online Decision Making. CoRR abs/1606.03966v2.Google Scholar
- Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. In 31st Intl. Conf. on Machine Learning (ICML). Google ScholarDigital Library
- Enda Barrett, Enda Howley, and Jim Duggan. 2013. Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurrency and Computation: Practice and Experience (2013).Google Scholar
- Léon Bottou, Jonas Peters, Joaquin Quiñonero Candela, Denis Xavier Charles, Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Y. Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: the example of computational advertising. J. Mach. Learn. Res. (JMLR) (2013). Google ScholarDigital Library
- Xiangping Bu, Jia Rao, and Cheng-Zhong Xu. 2009. A reinforcement learning approach to online web systems auto-configuration. In International Conference on Distributed Computing Systems (ICDCS). Google ScholarDigital Library
- Xiangping Bu, Jia Rao, and Cheng-Zhong Xu. 2013. Coordinated Self-Configuration of Virtual Machines and Appliances Using a Model-Free Learning Approach. IEEE Trans. Parallel Distrib. Syst. (2013). Google ScholarDigital Library
- Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Intl. Conf. on Machine Learning (ICML). Google ScholarDigital Library
- Alan S. Gerber and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. W.W. Norton&Co, Inc.Google Scholar
- D. G. Horvitz and D. J. Thompson. 1952. A Generalization of Sampling Without Replacement from a Finite Universe. J. Amer. Statist. Assoc.Google ScholarCross Ref
- Kevin G Jamieson, Lalit Jain, Chris Fernandez, Nicholas J Glattard, and Rob Nowak. 2015. NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- Junchen Jiang, Vyas Sekar, Ion Stoica, and Hui Zhang. 2010. Unleashing the Potential of Data-Driven Networking. In International Conference on Communication Systems and Networks (COMSNETS).Google Scholar
- Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. 2017. Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
- Nan Jiang and Lihong Li. 2016. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning. In Intl. Conf. on Machine Learning (ICML). Google ScholarDigital Library
- Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Ron Kohavi and Roger Longbotham. 2015. Online Controlled Experiments and A/B Tests. In Encyclopedia of Machine Learning and Data Mining, Claude Sammut and Geoff Webb (Ed.). Springer. To appear.Google Scholar
- Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. 2009. Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. (2009). Google ScholarDigital Library
- John Langford, Alexander Strehl, and Jennifer Wortman. 2008. Exploration Scavenging. In Intl. Conf. on Machine Learning (ICML). Google ScholarDigital Library
- John Langford and Tong Zhang. 2007. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Intl. World Wide Web Conf. (WWW). Google ScholarDigital Library
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In ACM Intl. Conf. on Web Search and Data Mining (WSDM). Google ScholarDigital Library
- Konstantinos Lolos, Ioannis Konstantinou, Verena Kantere, and Nectarios Koziris. 2017. Elastic Resource Management with Adaptive State Space Partitioning of Markov Decision Processes. arXiv preprint arXiv:1702.02978 (2017).Google Scholar
- Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, and Zoran Popovic. 2014. Offline Policy Evaluation Across Representations with Applications to Educational Games. In International Conference on Autonomous Agents and Multi-agent Systems. Google ScholarDigital Library
- Shie Mannor, Duncan Simester, Peng Sun, and John N Tsitsiklis. 2007. Bias and variance approximation in value function estimates. Management Science (2007). Google ScholarDigital Library
- Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM Workshop on Hot Topics in Networks (HotNets). Google ScholarDigital Library
- Microsoft. Accessed in 2017. Custom Decision Service. http://ds.microsoft.com. (Accessed in 2017).Google Scholar
- Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. arXiv preprint arXiv:1706.04972 (2017).Google ScholarDigital Library
- Netflix. 2011. The Netflix Simian Army. https://medium.com/netflix-techblog/the-netflix-simian-army-16e57fbab116. (2011).Google Scholar
- Nginx. Accessed in 2017. https://www.nginx.com/. (Accessed in 2017).Google Scholar
- Nginx. Accessed in 2017. Nginx list or variables. https://nginx.org/en/docs/varindex.html. (Accessed in 2017).Google Scholar
- Optimizely. Accessed in 2017. Optimizely: A/B Testing & Personalization Platform. https://www.optimizely.com/. (Accessed in 2017).Google Scholar
- Cosmin Paduraru. 2012. Off-policy evaluation in Markov decision processes. Ph.D. Dissertation. McGill University.Google Scholar
- Barry Porter, Matthew Grieves, Roberto Rodrigues Filho, and David Leslie. 2016. REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google ScholarDigital Library
- Doina Precup. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series (2000).Google ScholarDigital Library
- Jia Rao, Xiangping Bu, Cheng-Zhong Xu, Leyi Wang, and George Yin. 2009. VCONF: a reinforcement learning approach to virtual machines auto-configuration. In International conference on Autonomic computing. Google ScholarDigital Library
- Redis. Accessed in 2017. Redis Key-Value Store. http://http://redis.io. (Accessed in 2017).Google Scholar
- D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, and Michael Young. 2014. Machine Learning: The High-Interest Credit Card of Technical Debt. In SE4ML: Software Engineering 4 Machine Learning.Google Scholar
- Stephen M. Stigler. 1992. A Historical View of Statistical Concepts in Psychology and Educational Research. American Journal of Education (1992).Google Scholar
- Richard S Sutton and Andrew G Barto. 2017. Reinforcement learning: An introduction.Google Scholar
- Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora. 2009. Fast Gradient-descent Methods for Temporal-difference Learning with Linear Function Approximation. In Intl. Conf. on Machine Learning (ICML). Google ScholarDigital Library
- Philip S Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. 2015. High-Confidence Off-Policy Evaluation. In AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
- Dimitrios Tsoumakos, Ioannis Konstantinou, Christina Boumpouka, Spyros Sioutas, and Nectarios Koziris. 2013. Automated, elastic resource provisioning for nosql clusters using tiramola. In Cluster, Cloud and Grid Computing (CCGrid).Google Scholar
Index Terms
- Harvesting Randomness to Optimize Distributed Systems
Recommendations
When does randomness come from randomness?
A result of Shen says that if F : 2 N ź 2 N is an almost-everywhere computable, measure-preserving transformation, and y ź 2 N is Martin-Löf random, then there is a Martin-Löf random x ź 2 N such that F ( x ) = y . Answering a question of Bienvenu and ...
Extracting Randomness
Extractors are Boolean functions that allow, in some precise sense, extraction of randomness from somewhat random distributions, using only a small amount of truly random bits. Extractors, and the closely related “dispersers,” exhibit some of the most “...
On Kurtz randomness
Kurtz randomness is a notion of algorithmic randomness for real numbers. In particular a real x is called Kurtz random (or weakly random) iff it is contained in every computably enumerable set U of (Lebesgue) measure 1. We prove a number of ...
Comments