research-article

NUMA-aware memory manager with dominant-thread-based copying GC

Author:
Takeshi Ogasawara

IBM Research - Tokyo, Yamato, Japan

IBM Research - Tokyo, Yamato, Japan
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 44 Issue 10October 2009pp 377–390https://doi.org/10.1145/1639949.1640117

Published:25 October 2009Publication History

ACM SIGPLAN Notices

Abstract

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector.

Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph.

We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.

References

Advanced Micro Devices, Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems, 2006.Google Scholar
David F. Bacon, Ravi Konuru, Chet Murthy, and Mauricio Serrano. Thin locks: featherweight synchronization for Java. In PLDI '98: Proc. ACM SIGPLAN 1998 Conf. on Programming language design and implementation, pages 258--268, 1998. Google ScholarDigital Library
BEA Systems, Inc. BEA JRockit, r27.5, commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf, January 2008.Google Scholar
Martin J. Bligh, Matt Dobson, Darren Hart, and Gerrit Huizenga. Linux on NUMA systems. In Proceedings of the Linux Symposium, pages 295--306, 2004.Google Scholar
B. Brock, G. Carpenter, E. Chiprout, E. Elnozahy, M. Dean, D. Glasco, J. Peterson, R. Rajamony, F. Rawson, R. Rockhold, and A. Zimmerman. Windows NT in a ccNUMA system. In WINSYM'99: Proceedings of the 3rd conference on USENIX Windows NT Symposium, pages 7--7, Berkeley, CA, USA, 1999. USENIX Association. Google ScholarDigital Library
Myra Cohen, Shiu Beng Kooi, and Witawas Srisa-an. Clustering the heap in multi-threaded applications for improved garbage collection. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1901--1908, New York, NY, USA, 2006. ACM. ISBN 1-59593-186-4. doi: http://doi.acm.org/10.1145/1143997.1144314. Google ScholarDigital Library
IBM Corp. IBM @server pSeries 650 performance and tuning. http://www-03.ibm.com/systems/resources/systems_p_hardware_whitepapers_p650_perf.pdf, 2002.Google Scholar
IBM Corp. AIX 5L differences guide version 5.3 edition. http://www.redbooks.ibm.com/redpieces/pdfs/sg247463.pdf, 2004.Google Scholar
David Dagastine and Paul Hohensee. High performance Java technology in a multi-core world. http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-2885.pdf, 2007.Google Scholar
Robert Dimpsey, Rajiv Arora, and Kean Kuiper. Java server performance: a case study of building efficient, stable JVMs. IBM Systems Journal, 39(1):151--174, 2000. Google ScholarDigital Library
Tamar Domani, Gal Goldshtein, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. Thread-local heaps for Java. In ISMM '02: Proceedings of the 3rd International Symposium on Memory Management, pages 76--87, New York, NY, USA, 2002. ACM. ISBN 1-58113-539-4. doi: http://doi.acm.org/10.1145/512429.512439. Google ScholarDigital Library
Nikola Grcevski. Effective method for Java lock reservation for Java virtual machines that have cooperative multithreading. In 6th Workshop on Compiler-Driven Performance, Associated with CASCON2007, 2007.Google Scholar
Nikola Grcevski, Allan Kielstra, Kevin Stoodley, Mark G. Stoodley, and Vijay Sundaresan. Java just-intime compiler and virtual machine improvements for server and middleware applications. In Proc. the 3rd Virtual Machine Research and Technology Symposium, pages 151--162, 2004. Google ScholarDigital Library
Martin Hirzel. Data layouts for object-oriented programs. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 265--276, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-639-4. doi: http://doi.acm.org/10.1145/1254882.1254915. Google ScholarDigital Library
IBM Corp. IBM developer kit and runtime environment, Java technology edition, version 6 diagnostics guide. http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/diag60.pdf.Google Scholar
IBM Corp. WebSphere Application Server Network Deployment, version 6.1, tuning Java virtual machines. http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/tprf_tunejvm_v61.html.Google Scholar
Intel Corp. Intel 64 and IA-32 architectures optimization reference manual, 2009.Google Scholar
Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock reservation: Java locks can mostly do without atomic operations. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 130--141, 2002. Google ScholarDigital Library
Hung Q. Le,William J. Starke, J. Stephen Fields, Francis P. O'Connell, Dung Q. Nguyen, Bruce J. Ronchetti, Wolfram M. Sauer, Eric M. Schwarz, and Michael T. (Mike) Vaden. IBM POWER6 microarchitecture. IBM J. Res. Dev., 51(6):639--662, 2007. ISSN 0018-8646. Google ScholarDigital Library
Kyungwoo Lee and Samuel P. Midkiff. A two-phase escape analysis for parallel Java programs. In PACT'06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 53--62, New York, NY, USA, 2006. ACM. ISBN 1-59593-264-X. doi: http://doi.acm.org/10.1145/1152154.1152166. Google ScholarDigital Library
Kyungwoo Lee, Xing Fang, and Samuel P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd International Conference on Virtual Execution Environments, pages 180--190, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-630-1. doi: http://doi.acm.org/10.1145/1254810.1254836. Google ScholarDigital Library
Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419--429, 1983. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/358141.358147. Google ScholarDigital Library
Jaydeep Marathe and Frank Mueller. Hardware profileguided automatic page placement for ccnuma systems. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90--99, New York, NY, USA, 2006. ACM. ISBN 1-59593-189-9. doi: http://doi.acm.org/10.1145/1122971.1122987. Google ScholarDigital Library
Takeshi Ogasawara, Hideaki Komatsu, and Toshio Nakatani. TO-lock: Removing lock overhead using the owners' temporal locality. In PACT '04: Proc. 13th International Conf. on Parallel Architectures and Compilation Techniques, pages 255--266, 2004. Google ScholarDigital Library
Mattias Persson. Java technology, IBM style: Garbage collection policies, part 1. http://http://www.ibm.com/developerworks//library/j-ibmjava2/, 2006.Google Scholar
Kenneth Russell and David Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 263--272, 2006. Google ScholarDigital Library
Ryan Sciampacone. A production world view of garbage collection for the Java ME and SE spaces. http://www.mm-net.org.uk/school/programme.html.Google Scholar
Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. Exploiting prolific types for memory management and optimizations. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 295--306, New York, NY, USA, 2002. ACM. ISBN 1-58113-450-9. doi: http://doi.acm.org/10.1145/503272.503300. Google ScholarDigital Library
Bjarne Steensgaard. Thread-specific heaps for multithreaded programs. In ISMM '00: Proceedings of the 2nd International Symposium on Memory Management, pages 18--24, New York, NY, USA, 2000. ACM. ISBN 1-58113-263-8. doi: http://doi.acm.org/10.1145/362422.362432. Google ScholarDigital Library
Sun Microsystems, Inc. Ergonomics in the 5.0 JavaTMvirtual machine. http://java.sun.com/docs/hotspot/gc5.0/ergo5.html.Google Scholar
Sun Microsystems, Inc. SolarisTMmemory placement optimization and Sun FireTMservers. http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf, 2003.Google Scholar
Sun Microsystems, Inc. Java SE 6 HotSpotTMvirtual machine garbage collection tuning. http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html, 2008.Google Scholar
Sun Microsystems, Inc. Sun SPARC R enterprise T5440 server architecture, 2008.Google Scholar
Christian Terboven, Dieter an Mey, Dirk Schmidl, Henry Jin, and Thomas Reichstein. Data and thread affinity in openmp programs. In MAW '08: Proceedings of the 2008 workshop on Memory access on future processors, pages 377--384, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-091-3. doi: http://doi.acm.org/10.1145/1366219.1366222. Google ScholarDigital Library
Mustafa M. Tikir and Jeffery K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers, page 108.2, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2312-9. doi: http://dx.doi.org/10.1109/IPDPS.2005.299. Google ScholarDigital Library
Mustafa M. Tikir and Jeffrey K. Hollingsworth. Using hardware counters to automatically improve memory performance. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 46, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2153-3. doi: http://dx.doi.org/10.1109/SC.2004.64. Google ScholarDigital Library
David Ungar. Generation scavenging: A nondisruptive high performance storage reclamation algorithm. In SDE 1: Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8. doi: http://doi.acm.org/10.1145/800020. 808261. Google ScholarDigital Library

Index Terms

NUMA-aware memory manager with dominant-thread-based copying GC
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

NUMA-aware memory manager with dominant-thread-based copying GC
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is ...
Read More
The pauseless GC algorithm
VEE '05: Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments

Modern transactional response-time sensitive applications have run into practical limits on the size of garbage collected heaps. The heap can only grow until GC pauses exceed the response-time limits. Sustainable, scalable concurrent collection has ...
Read More
Topology-Aware Parallelism for NUMA Copying Collectors
LCPC 2015: Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 9519

NUMA-aware parallel algorithms in runtime systems attempt to improve locality by allocating memory from local NUMA nodes. Researchers have suggested that the garbage collector should profile memory access patterns or use object locality heuristics to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 44, Issue 10
OOPSLA '09
October 2009
554 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1639949
Issue’s Table of Contents
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
October 2009
590 pages
ISBN:9781605587660
DOI:10.1145/1640089
General Chair:
Shail Arora
Adayana, Inc.
,
Program Chair:
Gary Leavens
University of Central Florida
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2009
Check for updates
Author Tags
cc-NUMA
java
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 724
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NUMA-aware memory manager with dominant-thread-based copying GC

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

NUMA-aware memory manager with dominant-thread-based copying GC

The pauseless GC algorithm

Topology-Aware Parallelism for NUMA Copying Collectors