research-article

MN-MATE: Elastic Resource Management of Manycores and a Hybrid Memory Hierarchy for a Cloud Node

Authors:
Kyu Ho Park

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Woomin Hwang

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Hyunchul Seok

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Chulmin Kim

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Dong-jae Shin

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Dong Jin Kim

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Min Kyu Maeng

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

,
Seong Min Kim

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea

Computer Engineering Research Laboratory, KAIST, Daejeon, South Korea
View Profile

ACM Journal on Emerging Technologies in Computing Systems Volume 12 Issue 1Article No.: 5pp 1–25https://doi.org/10.1145/2701429

Published:03 August 2015Publication History

ACM Journal on Emerging Technologies in Computing Systems

Abstract

Recent advent of manycore system increases needs for larger but faster memory hierarchy. Emerging next generation memories such as on-chip DRAM and nonvolatile memory (NVRAM) are promising candidates for replacement of DRAM-only main memory. Combined with the manycore trends, it gives an opportunity to rethink conventional resource management system with a memory hierarchy for a single cloud node. In an attempt to mitigate the energy and memory problems, we propose MN-MATE, an elastic resource management architecture for a single cloud node with manycores, on-chip DRAM, and large size of off-chip DRAM and NVRAM. In MN-MATE, the hypervisor places consolidated VMs and balances memory among them. Based on the monitored information about the allocated memory, a guest OS co-schedules tasks accessing different types of memory with complementary access intensity. Polymorphic management of DRAM hierarchy accelerates average memory access speed inside each guest OS. A guest OS reduces energy consumption with small performance loss based on the NVRAM-aware data placement policy and the hybrid page cache. A new lightweight kernel is developed to reduce the overhead from the guest OS for scientific applications. Experiment results show that our techniques in MN-MATE platform improve system performance and reduce energy consumption.

References

AMD. 2013. BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h. Tech. Document.Google Scholar
Sorav Bansal and Dharmendra S. Modha. 2004. CAR: Clock with adaptive replacement. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. USENIX Association, 187--200. Google ScholarDigital Library
S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. 2011. A case for NUMA-aware contention management on multicore processors. In Proceedings of the Usenix Annual Technical Conference. Google ScholarDigital Library
Sergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova, and Ali Kamali. 2010. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). ACM, New York, 557--558. Google ScholarDigital Library
Martin J. Bligh, Matt Dobson, Darren Hart, and Gerrit Huizenga. 2004. Linux on NUMA systems. In Proceedings of the Linux Symposium, Vol. 1. 89--102.Google Scholar
H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert. 2011. Tiled QR factorization algorithms. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--11. Google ScholarDigital Library
Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 1, 38--53. Google ScholarDigital Library
Richard W. Carr and John L. Hennessy. 1981. WSCLOCK&Mdash: A simple and effective algorithm for virtual memory management. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP'81). ACM, New York, 87--95. DOI:http://dx.doi.org/10.1145/800216.806596 Google ScholarDigital Library
F. J. Corbato and MIT Cambridge Project MAC. 1968. A Paging Experiment with the Multics System. Defense Technical Information Center.Google Scholar
Asit Dan and Don Towsley. 1990. An approximate analysis of the LRU and FIFO buffer replacement schemes. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 143--152. Google ScholarDigital Library
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. 2010. Simple but effective heterogeneous main memory with on-chip memory controller support. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10). IEEE, 1--11. DOI:http://dx.doi.org/10.1109/SC.2010.50 Google ScholarDigital Library
Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, and Mendel Rosenblum. 1999. Cellular disco: Resource management using virtual clusters on shared-memory multiprocessors. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP'99). ACM, 154--169. Google ScholarDigital Library
R. Iyer. 2003. Performance implications of chipset caches in web servers. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 176--185. Google ScholarDigital Library
Song Jiang, Feng Chen, and Xiaodong Zhang. 2005. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of the USENIX Annual Technical Conference (ATEC'05). USENIX Association, Berkeley, CA, 35--35. Google ScholarDigital Library
Song Jiang and Xiaodong Zhang. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, 31--42. Google ScholarDigital Library
Xiaowei Jiang, N. Madan, Li Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture. 1--12. DOI:http://dx.doi.org/10.1109/HPCA.2010.5416642Google Scholar
Theodore Johnson and Dennis Shasha. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann, San Francisco, CA, 439--450. Google ScholarDigital Library
Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, and Krisztian Flautner. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 117--128. Google ScholarDigital Library
Kangho Kim, Cheiyol Kim, Sung-In Jung, Hyun-Sup Shin, and Jin-Soo Kim. 2008. Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'08). ACM, New York, 11--20. DOI:http://dx.doi.org/10.1145/1346256.1346259 Google ScholarDigital Library
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 2--13. Google ScholarDigital Library
D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. 2001. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput. 50, 12, 1352--1361. DOI:http://dx.doi.org/10.1109/TC.2001.970573 Google ScholarDigital Library
Christianto C. Liu, Ilya Ganusov, Martin Burtscher, and Sandip Tiwari. 2005. Bridging the processor-memory performance gap with 3D IC technology. IEEE Des. Test 22, 6, 556--564. Google ScholarDigital Library
Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, and Kaustav Banerjee. 2006. A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM, 991--996. Google ScholarDigital Library
Pin Lu and Kai Shen. 2007. Virtual machine memory access tracing with hypervisor exclusive cache. In Proceedings of the USENIX Annual Technical Conference (ATC'07). USENIX Association, 1--15. Google ScholarDigital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). ACM, New York, 190--200. DOI:http://dx.doi.org/10. 1145/1065010.1065034 Google ScholarDigital Library
Dan Magenheimer. 2008. Memory Overcommit... without the Commitment. Xen Summit.Google Scholar
Dan Magenheimer. 2009. Transcendent Memory on Xen. Xen Summit.Google Scholar
Z. Majo and T. R. Gross. 2011. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management. ACM, 11--20. Google ScholarDigital Library
Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling efficient and scalable hybrid memories using fine-granularity dram cache management. IEEE Comput. Archit. Lett. 11, 2, 61--64. DOI:http://dx.doi.org/10.1109/L-CA.2012.2 Google ScholarDigital Library
G. Nimako, E. J. Otoo, and D. Ohene-Kwofie. 2012. Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures. In Proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP'12): Part I. Springer, 443--457. DOI:http://dx.doi.org/10.1007/978-3-642-33078-0_32 Google ScholarDigital Library
Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. 1993. The LRU-K Page replacement algorithm for database disk buffering. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93). ACM, New York, 297--306. DOI:http://dx.doi.org/10.1145/170035. 170081 Google ScholarDigital Library
Hyunsun Park, Sungjoo Yoo, and Sunggu Lee. 2011b. Power management of hybrid DRAM/PRAM-based main memory. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference. 59--64. Google ScholarDigital Library
Kyu Ho Park, Sung Kyu Park, Woomin Hwang, Hyunchul Seok, Dong-Jae Shin, and Ki-Woong Park. 2012a. Resource management of manycores with a hierarchical and a hybrid main memory for MN-MATE cloud node. In Proceedings of the 8th IEEE World Congress on Services. 301--308. DOI:http://dx.doi.org/10.1109/SERVICES.2012.26 Google ScholarDigital Library
Kyu Ho Park, Sung Kyu Park, Hyunchul Seok, Woomin Hwang, Dong-Jae Shin, Jong Hun Choi, and Ki-Woong Park. 2012b. Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM'12). ACM, New York, 83--92. DOI:http://dx.doi.org/10.1145/2141702.2141712 Google ScholarDigital Library
Kyu Ho Park, Youngwoo Park, Woomin Hwang, and Ki-Woong Park. 2010b. MN-Mate: Resource management of manycores with DRAM and nonvolatile memories. In Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications. 24--34. DOI:http://dx.doi.org/10.1109/HPCC.2010.35 Google ScholarDigital Library
Youngwoo Park, Sung Kyu Park, and Kyu Ho Park. 2010a. Linux kernel support to exploit phase change memory. In Proceedings of the Linux Symposium. 217--224.Google Scholar
Youngwoo Park, Dong-Jae Shin, Sung Kyu Park, and Kyu-Ho Park. 2011a. Power-aware memory management for hybrid main memory. In Proceedings of the 2nd International Conference on Next Generation Information Technology. 82--85.Google Scholar
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, 24--33. DOI:http://dx.doi.org/10.1145/1555754.1555760 Google ScholarDigital Library
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS'11). ACM, New York, 85--95. DOI:http://dx.doi.org/10.1145/1995896.1995911 Google ScholarDigital Library
John T. Robinson and Murthy V. Devarakonda. 1990. Data cache management using frequency-based replacement. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'90). ACM, New York, 134--142. DOI:http://dx.doi.org/10.1145/98457.98523 Google ScholarDigital Library
Martin Schwidefsky, Hubertus Franke, Ray Mansell, Damian Osisek, Himanshu Raj, and Jonghyuk Choi. 2006. Collaborative memory management in hosted linux systems. In Proceedings of the Ottawa Linux Symposium.Google Scholar
Dong-Jae Shin, Sung Kyu Park, Seong Min Kim, and Kyu Ho Park. 2012. Adaptive page grouping for energy efficiency in hybrid PRAM-DRAM main memory. In Proceedings of the ACM Research in Applied Computation Symposium. ACM, 395--402. Google ScholarDigital Library
Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job scheduling for a simultaneous multithreaded processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 234--244. DOI:http://dx.doi.org/10.1145/378993.379244 Google ScholarDigital Library
SPEC. 2012. Spec's benchmark. http://www.spec.org/cpu2006.Google Scholar
UMass TraceRepository. 2007. OLTP Application I/O and Search Engine I/O. http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
VMware. 2005. ESX Server 2 NUMA Support. WhitePaper, http://www.vmware.com/pdf/esx2_NUMA.pdf.Google Scholar
Carl A. Waldspurger. 2002. Memory resource management in VMware ESX Server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI'02). Google ScholarDigital Library
Dong Hyuk Woo, Nak Hee Seong, D. L. Lewis, and H.-H. S. Lee. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture. 1--12. DOI:http://dx.doi.org/10. 1109/HPCA.2010.5416628Google ScholarCross Ref
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 34--45. DOI:http://dx.doi.org/10. 1145/1555754.1555761 Google ScholarDigital Library
Xiaolan Zhang, Suzanne McIntosh, Pankaj Rohatgi, and John Linwood Griffin. 2007. XenSocket: A high-throughput interdomain transport for virtual machines. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'07). Springer, 184--203. http://dl.acm.org/citation.cfm?id=1516124.1516138 Google ScholarDigital Library
Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang. 2004. Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53, 7, 843--855. DOI:http://dx.doi.org/10.1109/TC.2004.27 Google ScholarDigital Library
Li Zhao, R. Iyer, R. Illikkal, and D. Newell. 2007. Exploring DRAM cache architectures for CMP server platforms. In Proceedings of the 25th International Conference onComputer Design (ICCD'07). 55--62. DOI:http://dx.doi.org/10.1109/ICCD.2007.4601880Google Scholar
Weiming Zhao and Zhenlin Wang. 2009. Dynamic memory balancing for virtual machines. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'09). ACM, 21--30. Google ScholarDigital Library
Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of the General Track: 2002 USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 91--104. http://dl.acm.org/citation.cfm?id=647055.715773 Google ScholarDigital Library
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, 129--142. DOI:http://dx.doi.org/10.1145/1736020.1736036 Google ScholarDigital Library

Index Terms

MN-MATE: Elastic Resource Management of Manycores and a Hybrid Memory Hierarchy for a Cloud Node
1. Networks
  1. Network protocols

Recommendations

Resource Management of Manycores with a Hierarchical and a Hybrid Main Memory for MN-MATE Cloud Node
SERVICES '12: Proceedings of the 2012 IEEE Eighth World Congress on Services

The advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Emerging technologies such as on-chip DRAM and nonvolatile memory (NVRAM) receive attention as promising solutions for them. Nonvolatile memory is ...
Read More
MN-GEMS: A Timing-Aware Simulator for a Cloud Node with Manycore, DRAM, and Non-volatile Memories
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing

In this paper, we describe a part of our on-going research project aimed at the management of many core and Hybrid Main Memory with DRAM and Non-Volatile RAMs (NVRAMs).By the needs of simulation and through investigation of the requirements for the ...
Read More
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

The advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Thus, emerging technologies such as on-chip memory and nonvolatile memory (NVRAM) have led to a paradigm shift in computing architecture era. For ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal on Emerging Technologies in Computing Systems Volume 12, Issue 1
July 2015
210 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/2810396
Editor:
Krishnendu Chakrabarty
Duke University, USA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 3 August 2015
- Accepted: 1 December 2014
- Revised: 1 July 2014
- Received: 1 March 2014
Published in jetc Volume 12, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NVRAM
Virtual machine
hybrid main memory
resource management
scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 249
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MN-MATE: Elastic Resource Management of Manycores and a Hybrid Memory Hierarchy for a Cloud Node

ACM Journal on Emerging Technologies in Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Resource Management of Manycores with a Hierarchical and a Hybrid Main Memory for MN-MATE Cloud Node

MN-GEMS: A Timing-Aware Simulator for a Cloud Node with Manycore, DRAM, and Non-volatile Memories

Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform