Abstract
Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking and migrations, limiting the benefits from heterogeneity. To address this, we design HeteroOS, a novel application-transparent OS-level solution for managing memory heterogeneity in virtualized system. The HeteroOS design first makes the guest-OSes heterogeneity-aware and then extracts rich OS-level information about applications' memory usage to place data in the 'right' memory avoiding page migrations. When such pro-active placements are not possible, HeteroOS combines the power of the guest-OSes' information about applications with the VMM's hardware control to track for hotness and migrate only performance-critical pages. Finally, HeteroOS also designs an efficient heterogeneous memory sharing across multiple guest-VMs. Evaluation of HeteroOS with memory, storage, and network-intensive datacenter applications shows up to 2x performance improvement compared to the state-of-the-art VMM-exclusive approach.
- Ameen Akel, Adrian M. Caulfield, Todor I. Mollov, Rajesh K. Gupta, and Steven Swanson. Onyx: a protoype phase change memory storage array. In HotStorage '11. Google ScholarDigital Library
- Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data Reorganization in Memory Using 3D-stacked DRAM. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 131--143. Google ScholarDigital Library
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An Optimal Memory Allocation Scheme for Scratch-pad-based Embedded Systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6--26. Google ScholarDigital Library
- Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh, Don McCaule, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb. 2006. Die Stacking (3D) Microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). IEEE Computer Society, Washington, DC, USA, 469--479. Google ScholarCross Ref
- Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An Operating System for Many Cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 43--57. http://dl.acm.org/citation.cfm?id=1855741.1855745 Google ScholarDigital Library
- Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 1--12. Google ScholarCross Ref
- Chia-Chen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2015. BATMAN: Maximizing Bandwidth Utilization for Hybrid Memory Systems. In Technical Report, TR-CARET-2015-01 (March 9, 2015).Google Scholar
- Tae-Sun Chung, Dong-Joo Park, Sangwon Park, Dong-Ho Lee, Sang-Won Lee, and Ha-Joo Song. 2009. A Survey of Flash Translation Layer. J. Syst. Archit. 55, 5-6 (May 2009), 332--343. Google ScholarDigital Library
- Jonathan Corbet. 2016. Linux Swap priority. https://lwn.net/Articles/690079. (2016).Google Scholar
- Jonathan Crobett. 2003. Linux object-based reverse-mapping. https://lwn.net/Articles/23732/. (2003).Google Scholar
- Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-power Modes for Main Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 225--238. Google ScholarDigital Library
- Peter J. Denning. 1968. The Working Set Model for Program Behavior. Commun. ACM 11, 5 (May 1968), 323--333. Google ScholarDigital Library
- Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. 2010. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). IEEE Computer Society, Washington, DC, USA, 1--11. Google ScholarDigital Library
- Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14). ACM, New York, NY, USA, Article 15, 15 pages. Google ScholarDigital Library
- Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data Tiering in Heterogeneous Memory Systems. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 15, 16 pages. Google ScholarDigital Library
- Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12). ACM, New York, NY, USA, 37--48. Google ScholarDigital Library
- Tal Garfinkel and Mendel Rosenblum. 2005. When Virtual is Harder Than Real: Security Challenges in Virtual Machine Based Computing Environments. In Proceedings of the 10th Conference on Hot Topics in Operating Systems - Volume 10 (HOTOS'05). USENIX Association, Berkeley, CA, USA, 20--20. http://dl.acm.org/citation.cfm?id=1251123.1251143 Google ScholarDigital Library
- Sanjay Ghemawat and Jeff Dean. 2011. Google LevelDB. http://tinyurl.com/osqd7c8. (2011).Google Scholar
- Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. 2011. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, Berkeley, CA, USA, 323--336. http://dl.acm.org/citation.cfm?id=1972457.1972490 Google ScholarDigital Library
- Jerome Glisse. 2016. Linux heterogeneous memory management. https://lwn.net/Articles/679300/. (2016).Google Scholar
- Maya Gokhale, Scott Lloyd, and Chris Macaraeg. 2015. Hybrid Memory Cube Performance Characterization on Data-centric Workloads. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3 '15). ACM, New York, NY, USA, Article 7, 8 pages. Google ScholarDigital Library
- Mel Gorman. 2004. Understanding the Linux Virtual Memory Manager. Prentice Hall PTR, Upper Saddle River, NJ, USA. Google ScholarDigital Library
- Mel Gorman. 2012. Foundation for automatic NUMA balancing. https://lwn.net/Articles/523065. (2012).Google Scholar
- Vishal Gupta, Min Lee, and Karsten Schwan. 2015. HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 79--92. Google ScholarDigital Library
- Anthony Gutierrez, Michael Cieslak, Bharan Giridhar, Ronald G. Dreslinski, Luis Ceze, and Trevor Mudge. 2014. Integrated 3D-stacked Server Designs for Increasing Physical Density of Key-value Stores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 485--498. Google ScholarDigital Library
- Heather Hanson and Karthick Rajamani. 2012. What Computer Architects Need to Know About Memory Throttling. In Proceedings of the 2010 International Conference on Computer Architecture (ISCA'10). Springer-Verlag, Berlin, Heidelberg, 233--242. Google ScholarDigital Library
- Jingtong Hu, Qingfeng Zhuge, Chun Jason Xue, Wei-Che Tseng, and Edwin H.-M. Sha. 2013. Software Enabled Wear-leveling for Hybrid PCM Main Memory on Embedded Systems. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '13). EDA Consortium, San Jose, CA, USA, 599--602. http://dl.acm.org/citation.cfm?id=2485288.2485434 Google ScholarDigital Library
- Sysoev Igor. 2004. NGinx Webserver. http://nginx.org. (2004).Google Scholar
- Xiaowei Jiang, N. Madan, Li Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. 1--12.Google Scholar
- Crobett Jonathan. 2012. Linux Swapping. https://lwn.net/Articles/495543. (2012).Google Scholar
- Yongsoo Joo, Dimin Niu, Xiangyu Dong, Guangyu Sun, Naehyuck Chang, and Yuan Xie. 2010. Energy- and Endurance-aware Design of Phase Change Memory Caches. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '10). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 136--141. http://dl.acm.org/citation.cfm?id=1870926.1870961 Google ScholarDigital Library
- Sudarsun Kannan, Ada Gavrilovska, and Karsten Schwan. 2016. pVM: Persistent Virtual Memory for Efficient Capacity Scaling and Object Storage. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 13, 16 pages. Google ScholarDigital Library
- Michael Kerrisk. 2007. Linux NUMA policies. http://man7.org/linux/man-pages/man3/numa.3.html. (2007).Google Scholar
- Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarDigital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and others. Architecting phase change memory as a scalable dram alternative. In ISCA '09. Google ScholarDigital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In ISCA. ACM. Google ScholarDigital Library
- Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards Programming Heterogeneous Memory Asynchronously. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 369--383. Google ScholarDigital Library
- Duo Liu, Tianzheng Wang, Yi Wang, Zhiwei Qin, and Zili Shao. 2011. PCM-FTL: A Write-Activity-Aware NAND Flash Memory Management Scheme for PCM-Based Embedded Systems. In Proceedings of the 2011 IEEE 32Nd Real-Time Systems Symposium (RTSS '11). IEEE Computer Society, Washington, DC, USA, 357--366. Google ScholarDigital Library
- Duo Liu, Tianzheng Wang, Yi Wang, Zhiwei Qin, and Zili Shao. 2012. A Block-level Flash Memory Management Scheme for Reducing Write Activities in PCM-based Embedded Systems. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '12). EDA Consortium, San Jose, CA, USA, 1447--1450. http://dl.acm.org/citation.cfm?id=2492708.2493062 Google ScholarDigital Library
- Ren-Shuo Liu, De-Yu Shen, Chia-Lin Yang, Shun-Chih Yu, and Cheng-Yuan Michael Wang. 2014. NVM Duet: Unified Working Memory and Persistent Store Architecture. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 455--470. Google ScholarDigital Library
- Gabriel Loh and Mark D. Hill. 2012. Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap. IEEE Micro 32, 3 (May 2012), 70--78. Google ScholarCross Ref
- Sally A. McKee. 2004. Reflections on the Memory Wall. In Proceedings of the 1st Conference on Computing Frontiers (CF '04). ACM, New York, NY, USA, 162--. Google ScholarDigital Library
- M.R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G.H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. 126--136.Google Scholar
- Rick Nelson. 2014. NGinx memory usage. https://www.nginx.com/blog/nginx-websockets-performance/. (2014).Google Scholar
- Mark Oskin and Gabriel H. Loh. 2015. A Software-Managed Approach to Die-Stacked DRAM. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 188--200. Google ScholarDigital Library
- Sujay Phadke and S. Narayanasamy. 2011. MLP aware heterogeneous memory system. In Design, Automation Test in Europe Conference Exhibition (DATE), 2011. 1--6.Google Scholar
- Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing Lifetime and Security of PCM-based Main Memory with Start-gap Wear Leveling. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 14--23. Google ScholarDigital Library
- Moinuddin K. Qureshi and Gabe H. Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 235--246. Google ScholarDigital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. SIGARCH Comput. Archit. News 7, 3 (June 2009), 24--33. Google ScholarDigital Library
- Milan Radulovic, Darko Zivanovic, Daniel Ruiz, Bronis R. de Supinski, Sally A. McKee, Petar Radojković, and Eduard Ayguadé. 2015. Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?. In Proceedings of the 2015 International Symposium on Memory Systems (MEMSYS '15). ACM, New York, NY, USA, 31--36. Google ScholarDigital Library
- L. Ramos and R. Bianchini. 2012. Exploiting Phase-Change Memory in Cooperative Caches. In Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on. 227--234. Google ScholarDigital Library
- Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95. Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. 2007. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on. 13--24. Google ScholarDigital Library
- David Rientjes. 2007. Linux Fake NUMA Patch. https://www.kernel.org/doc/Documentation/x86/x86_64/fake-numa-for-cpusets. (2007).Google Scholar
- D.A. Roberts. 2016. Reliable wear-leveling for non-volatile memory and method therefor. (May 26 2016). http://www.google.ch/patents/US20160147467 US Patent App. 14/554,972.Google Scholar
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 472--488. Google ScholarDigital Library
- Salvatore Sanfilippo. 2009. Redis. http://redis.io/. (2009).Google Scholar
- Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1--24.Google ScholarCross Ref
- Billy Tallis. 2017. Intel-Micron Memory 3D XPoint. goo.gl/wT4rQ6. (2017).Google Scholar
- Drepper Ulrich. 2007. "What every programmer should know about memory,". www.akkadia.org/drepper/cpumemory.pdf. (2007).Google Scholar
- Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and Durable Data Structures for Non-volatile Byte-addressable Memory. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST'11). USENIX Association, Berkeley, CA, USA, 5--5. http://dl.acm.org/citation.cfm?id=1960475.1960480 Google ScholarDigital Library
- Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 181--194. Google ScholarDigital Library
- Wm. A. Wulf and Sally A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. SIGARCH Comput. Archit. News 23, 1 (March 1995), 20--24. Google ScholarDigital Library
- Fengzhe Zhang, Jin Chen, Haibo Chen, and Binyu Zang. 2011. CloudVisor: Retrofitting Protection of Virtual Machines in Multi-tenant Cloud with Nested Virtualization. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). CM, New York, NY, USA, 203--216. Google ScholarDigital Library
- Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: Closing the Performance Gap Between Systems with and Without Persistence Support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421-- 432. Google ScholarDigital Library
- Li Zhao, R. Iyer, R. Illikkal, and D. Newell. 2007. Exploring DRAM cache architectures for CMP server platforms. In Computer Design, 2007. ICCD 2007. 25th International Conference on. 55--62.Google Scholar
Index Terms
- HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter
Recommendations
HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitectureHeterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking ...
HeteroOS: OS Design for Heterogeneous Memory Management in Datacenters
Special TopicsHeterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking ...
My VM is Lighter (and Safer) than your Container
SOSP '17: Proceedings of the 26th Symposium on Operating Systems PrinciplesContainers are in great demand because they are lightweight when compared to virtual machines. On the downside, containers offer weaker isolation than VMs, to the point where people run containers in virtual machines to achieve proper isolation. In this ...
Comments