Abstract
Recent advent of manycore system increases needs for larger but faster memory hierarchy. Emerging next generation memories such as on-chip DRAM and nonvolatile memory (NVRAM) are promising candidates for replacement of DRAM-only main memory. Combined with the manycore trends, it gives an opportunity to rethink conventional resource management system with a memory hierarchy for a single cloud node. In an attempt to mitigate the energy and memory problems, we propose MN-MATE, an elastic resource management architecture for a single cloud node with manycores, on-chip DRAM, and large size of off-chip DRAM and NVRAM. In MN-MATE, the hypervisor places consolidated VMs and balances memory among them. Based on the monitored information about the allocated memory, a guest OS co-schedules tasks accessing different types of memory with complementary access intensity. Polymorphic management of DRAM hierarchy accelerates average memory access speed inside each guest OS. A guest OS reduces energy consumption with small performance loss based on the NVRAM-aware data placement policy and the hybrid page cache. A new lightweight kernel is developed to reduce the overhead from the guest OS for scientific applications. Experiment results show that our techniques in MN-MATE platform improve system performance and reduce energy consumption.
- AMD. 2013. BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h. Tech. Document.Google Scholar
- Sorav Bansal and Dharmendra S. Modha. 2004. CAR: Clock with adaptive replacement. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. USENIX Association, 187--200. Google ScholarDigital Library
- S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. 2011. A case for NUMA-aware contention management on multicore processors. In Proceedings of the Usenix Annual Technical Conference. Google ScholarDigital Library
- Sergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova, and Ali Kamali. 2010. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). ACM, New York, 557--558. Google ScholarDigital Library
- Martin J. Bligh, Matt Dobson, Darren Hart, and Gerrit Huizenga. 2004. Linux on NUMA systems. In Proceedings of the Linux Symposium, Vol. 1. 89--102.Google Scholar
- H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert. 2011. Tiled QR factorization algorithms. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--11. Google ScholarDigital Library
- Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 1, 38--53. Google ScholarDigital Library
- Richard W. Carr and John L. Hennessy. 1981. WSCLOCK&Mdash: A simple and effective algorithm for virtual memory management. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP'81). ACM, New York, 87--95. DOI:http://dx.doi.org/10.1145/800216.806596 Google ScholarDigital Library
- F. J. Corbato and MIT Cambridge Project MAC. 1968. A Paging Experiment with the Multics System. Defense Technical Information Center.Google Scholar
- Asit Dan and Don Towsley. 1990. An approximate analysis of the LRU and FIFO buffer replacement schemes. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 143--152. Google ScholarDigital Library
- Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. 2010. Simple but effective heterogeneous main memory with on-chip memory controller support. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10). IEEE, 1--11. DOI:http://dx.doi.org/10.1109/SC.2010.50 Google ScholarDigital Library
- Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, and Mendel Rosenblum. 1999. Cellular disco: Resource management using virtual clusters on shared-memory multiprocessors. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP'99). ACM, 154--169. Google ScholarDigital Library
- R. Iyer. 2003. Performance implications of chipset caches in web servers. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 176--185. Google ScholarDigital Library
- Song Jiang, Feng Chen, and Xiaodong Zhang. 2005. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of the USENIX Annual Technical Conference (ATEC'05). USENIX Association, Berkeley, CA, 35--35. Google ScholarDigital Library
- Song Jiang and Xiaodong Zhang. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, 31--42. Google ScholarDigital Library
- Xiaowei Jiang, N. Madan, Li Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture. 1--12. DOI:http://dx.doi.org/10.1109/HPCA.2010.5416642Google Scholar
- Theodore Johnson and Dennis Shasha. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann, San Francisco, CA, 439--450. Google ScholarDigital Library
- Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, and Krisztian Flautner. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 117--128. Google ScholarDigital Library
- Kangho Kim, Cheiyol Kim, Sung-In Jung, Hyun-Sup Shin, and Jin-Soo Kim. 2008. Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'08). ACM, New York, 11--20. DOI:http://dx.doi.org/10.1145/1346256.1346259 Google ScholarDigital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 2--13. Google ScholarDigital Library
- D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. 2001. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput. 50, 12, 1352--1361. DOI:http://dx.doi.org/10.1109/TC.2001.970573 Google ScholarDigital Library
- Christianto C. Liu, Ilya Ganusov, Martin Burtscher, and Sandip Tiwari. 2005. Bridging the processor-memory performance gap with 3D IC technology. IEEE Des. Test 22, 6, 556--564. Google ScholarDigital Library
- Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, and Kaustav Banerjee. 2006. A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM, 991--996. Google ScholarDigital Library
- Pin Lu and Kai Shen. 2007. Virtual machine memory access tracing with hypervisor exclusive cache. In Proceedings of the USENIX Annual Technical Conference (ATC'07). USENIX Association, 1--15. Google ScholarDigital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). ACM, New York, 190--200. DOI:http://dx.doi.org/10. 1145/1065010.1065034 Google ScholarDigital Library
- Dan Magenheimer. 2008. Memory Overcommit... without the Commitment. Xen Summit.Google Scholar
- Dan Magenheimer. 2009. Transcendent Memory on Xen. Xen Summit.Google Scholar
- Z. Majo and T. R. Gross. 2011. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management. ACM, 11--20. Google ScholarDigital Library
- Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling efficient and scalable hybrid memories using fine-granularity dram cache management. IEEE Comput. Archit. Lett. 11, 2, 61--64. DOI:http://dx.doi.org/10.1109/L-CA.2012.2 Google ScholarDigital Library
- G. Nimako, E. J. Otoo, and D. Ohene-Kwofie. 2012. Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures. In Proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP'12): Part I. Springer, 443--457. DOI:http://dx.doi.org/10.1007/978-3-642-33078-0_32 Google ScholarDigital Library
- Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. 1993. The LRU-K Page replacement algorithm for database disk buffering. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93). ACM, New York, 297--306. DOI:http://dx.doi.org/10.1145/170035. 170081 Google ScholarDigital Library
- Hyunsun Park, Sungjoo Yoo, and Sunggu Lee. 2011b. Power management of hybrid DRAM/PRAM-based main memory. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference. 59--64. Google ScholarDigital Library
- Kyu Ho Park, Sung Kyu Park, Woomin Hwang, Hyunchul Seok, Dong-Jae Shin, and Ki-Woong Park. 2012a. Resource management of manycores with a hierarchical and a hybrid main memory for MN-MATE cloud node. In Proceedings of the 8th IEEE World Congress on Services. 301--308. DOI:http://dx.doi.org/10.1109/SERVICES.2012.26 Google ScholarDigital Library
- Kyu Ho Park, Sung Kyu Park, Hyunchul Seok, Woomin Hwang, Dong-Jae Shin, Jong Hun Choi, and Ki-Woong Park. 2012b. Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM'12). ACM, New York, 83--92. DOI:http://dx.doi.org/10.1145/2141702.2141712 Google ScholarDigital Library
- Kyu Ho Park, Youngwoo Park, Woomin Hwang, and Ki-Woong Park. 2010b. MN-Mate: Resource management of manycores with DRAM and nonvolatile memories. In Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications. 24--34. DOI:http://dx.doi.org/10.1109/HPCC.2010.35 Google ScholarDigital Library
- Youngwoo Park, Sung Kyu Park, and Kyu Ho Park. 2010a. Linux kernel support to exploit phase change memory. In Proceedings of the Linux Symposium. 217--224.Google Scholar
- Youngwoo Park, Dong-Jae Shin, Sung Kyu Park, and Kyu-Ho Park. 2011a. Power-aware memory management for hybrid main memory. In Proceedings of the 2nd International Conference on Next Generation Information Technology. 82--85.Google Scholar
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, 24--33. DOI:http://dx.doi.org/10.1145/1555754.1555760 Google ScholarDigital Library
- Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS'11). ACM, New York, 85--95. DOI:http://dx.doi.org/10.1145/1995896.1995911 Google ScholarDigital Library
- John T. Robinson and Murthy V. Devarakonda. 1990. Data cache management using frequency-based replacement. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'90). ACM, New York, 134--142. DOI:http://dx.doi.org/10.1145/98457.98523 Google ScholarDigital Library
- Martin Schwidefsky, Hubertus Franke, Ray Mansell, Damian Osisek, Himanshu Raj, and Jonghyuk Choi. 2006. Collaborative memory management in hosted linux systems. In Proceedings of the Ottawa Linux Symposium.Google Scholar
- Dong-Jae Shin, Sung Kyu Park, Seong Min Kim, and Kyu Ho Park. 2012. Adaptive page grouping for energy efficiency in hybrid PRAM-DRAM main memory. In Proceedings of the ACM Research in Applied Computation Symposium. ACM, 395--402. Google ScholarDigital Library
- Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job scheduling for a simultaneous multithreaded processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 234--244. DOI:http://dx.doi.org/10.1145/378993.379244 Google ScholarDigital Library
- SPEC. 2012. Spec's benchmark. http://www.spec.org/cpu2006.Google Scholar
- UMass TraceRepository. 2007. OLTP Application I/O and Search Engine I/O. http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
- VMware. 2005. ESX Server 2 NUMA Support. WhitePaper, http://www.vmware.com/pdf/esx2_NUMA.pdf.Google Scholar
- Carl A. Waldspurger. 2002. Memory resource management in VMware ESX Server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI'02). Google ScholarDigital Library
- Dong Hyuk Woo, Nak Hee Seong, D. L. Lewis, and H.-H. S. Lee. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture. 1--12. DOI:http://dx.doi.org/10. 1109/HPCA.2010.5416628Google ScholarCross Ref
- Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 34--45. DOI:http://dx.doi.org/10. 1145/1555754.1555761 Google ScholarDigital Library
- Xiaolan Zhang, Suzanne McIntosh, Pankaj Rohatgi, and John Linwood Griffin. 2007. XenSocket: A high-throughput interdomain transport for virtual machines. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'07). Springer, 184--203. http://dl.acm.org/citation.cfm?id=1516124.1516138 Google ScholarDigital Library
- Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang. 2004. Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53, 7, 843--855. DOI:http://dx.doi.org/10.1109/TC.2004.27 Google ScholarDigital Library
- Li Zhao, R. Iyer, R. Illikkal, and D. Newell. 2007. Exploring DRAM cache architectures for CMP server platforms. In Proceedings of the 25th International Conference onComputer Design (ICCD'07). 55--62. DOI:http://dx.doi.org/10.1109/ICCD.2007.4601880Google Scholar
- Weiming Zhao and Zhenlin Wang. 2009. Dynamic memory balancing for virtual machines. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'09). ACM, 21--30. Google ScholarDigital Library
- Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of the General Track: 2002 USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 91--104. http://dl.acm.org/citation.cfm?id=647055.715773 Google ScholarDigital Library
- Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 129--142. DOI:http://dx.doi.org/10.1145/1736020.1736036 Google ScholarDigital Library
Index Terms
- MN-MATE: Elastic Resource Management of Manycores and a Hybrid Memory Hierarchy for a Cloud Node
Recommendations
Resource Management of Manycores with a Hierarchical and a Hybrid Main Memory for MN-MATE Cloud Node
SERVICES '12: Proceedings of the 2012 IEEE Eighth World Congress on ServicesThe advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Emerging technologies such as on-chip DRAM and nonvolatile memory (NVRAM) receive attention as promising solutions for them. Nonvolatile memory is ...
MN-GEMS: A Timing-Aware Simulator for a Cloud Node with Manycore, DRAM, and Non-volatile Memories
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud ComputingIn this paper, we describe a part of our on-going research project aimed at the management of many core and Hybrid Main Memory with DRAM and Non-Volatile RAMs (NVRAMs).By the needs of simulation and through investigation of the requirements for the ...
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and ManycoresThe advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Thus, emerging technologies such as on-chip memory and nonvolatile memory (NVRAM) have led to a paradigm shift in computing architecture era. For ...
Comments