skip to main content
research-article

NUMA-aware memory manager with dominant-thread-based copying GC

Published:25 October 2009Publication History
Skip Abstract Section

Abstract

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector.

Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph.

We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.

References

  1. Advanced Micro Devices, Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems, 2006.Google ScholarGoogle Scholar
  2. David F. Bacon, Ravi Konuru, Chet Murthy, and Mauricio Serrano. Thin locks: featherweight synchronization for Java. In PLDI '98: Proc. ACM SIGPLAN 1998 Conf. on Programming language design and implementation, pages 258--268, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BEA Systems, Inc. BEA JRockit, r27.5, commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf, January 2008.Google ScholarGoogle Scholar
  4. Martin J. Bligh, Matt Dobson, Darren Hart, and Gerrit Huizenga. Linux on NUMA systems. In Proceedings of the Linux Symposium, pages 295--306, 2004.Google ScholarGoogle Scholar
  5. B. Brock, G. Carpenter, E. Chiprout, E. Elnozahy, M. Dean, D. Glasco, J. Peterson, R. Rajamony, F. Rawson, R. Rockhold, and A. Zimmerman. Windows NT in a ccNUMA system. In WINSYM'99: Proceedings of the 3rd conference on USENIX Windows NT Symposium, pages 7--7, Berkeley, CA, USA, 1999. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Myra Cohen, Shiu Beng Kooi, and Witawas Srisa-an. Clustering the heap in multi-threaded applications for improved garbage collection. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1901--1908, New York, NY, USA, 2006. ACM. ISBN 1-59593-186-4. doi: http://doi.acm.org/10.1145/1143997.1144314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. IBM Corp. IBM @server pSeries 650 performance and tuning. http://www-03.ibm.com/systems/resources/systems_p_hardware_whitepapers_p650_perf.pdf, 2002.Google ScholarGoogle Scholar
  8. IBM Corp. AIX 5L differences guide version 5.3 edition. http://www.redbooks.ibm.com/redpieces/pdfs/sg247463.pdf, 2004.Google ScholarGoogle Scholar
  9. David Dagastine and Paul Hohensee. High performance Java technology in a multi-core world. http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-2885.pdf, 2007.Google ScholarGoogle Scholar
  10. Robert Dimpsey, Rajiv Arora, and Kean Kuiper. Java server performance: a case study of building efficient, stable JVMs. IBM Systems Journal, 39(1):151--174, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tamar Domani, Gal Goldshtein, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. Thread-local heaps for Java. In ISMM '02: Proceedings of the 3rd International Symposium on Memory Management, pages 76--87, New York, NY, USA, 2002. ACM. ISBN 1-58113-539-4. doi: http://doi.acm.org/10.1145/512429.512439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nikola Grcevski. Effective method for Java lock reservation for Java virtual machines that have cooperative multithreading. In 6th Workshop on Compiler-Driven Performance, Associated with CASCON2007, 2007.Google ScholarGoogle Scholar
  13. Nikola Grcevski, Allan Kielstra, Kevin Stoodley, Mark G. Stoodley, and Vijay Sundaresan. Java just-intime compiler and virtual machine improvements for server and middleware applications. In Proc. the 3rd Virtual Machine Research and Technology Symposium, pages 151--162, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Martin Hirzel. Data layouts for object-oriented programs. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 265--276, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-639-4. doi: http://doi.acm.org/10.1145/1254882.1254915. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. IBM Corp. IBM developer kit and runtime environment, Java technology edition, version 6 diagnostics guide. http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/diag60.pdf.Google ScholarGoogle Scholar
  16. IBM Corp. WebSphere Application Server Network Deployment, version 6.1, tuning Java virtual machines. http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/tprf_tunejvm_v61.html.Google ScholarGoogle Scholar
  17. Intel Corp. Intel 64 and IA-32 architectures optimization reference manual, 2009.Google ScholarGoogle Scholar
  18. Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock reservation: Java locks can mostly do without atomic operations. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 130--141, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hung Q. Le,William J. Starke, J. Stephen Fields, Francis P. O'Connell, Dung Q. Nguyen, Bruce J. Ronchetti, Wolfram M. Sauer, Eric M. Schwarz, and Michael T. (Mike) Vaden. IBM POWER6 microarchitecture. IBM J. Res. Dev., 51(6):639--662, 2007. ISSN 0018-8646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kyungwoo Lee and Samuel P. Midkiff. A two-phase escape analysis for parallel Java programs. In PACT'06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 53--62, New York, NY, USA, 2006. ACM. ISBN 1-59593-264-X. doi: http://doi.acm.org/10.1145/1152154.1152166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kyungwoo Lee, Xing Fang, and Samuel P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd International Conference on Virtual Execution Environments, pages 180--190, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-630-1. doi: http://doi.acm.org/10.1145/1254810.1254836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419--429, 1983. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/358141.358147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jaydeep Marathe and Frank Mueller. Hardware profileguided automatic page placement for ccnuma systems. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90--99, New York, NY, USA, 2006. ACM. ISBN 1-59593-189-9. doi: http://doi.acm.org/10.1145/1122971.1122987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Takeshi Ogasawara, Hideaki Komatsu, and Toshio Nakatani. TO-lock: Removing lock overhead using the owners' temporal locality. In PACT '04: Proc. 13th International Conf. on Parallel Architectures and Compilation Techniques, pages 255--266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mattias Persson. Java technology, IBM style: Garbage collection policies, part 1. http://http://www.ibm.com/developerworks//library/j-ibmjava2/, 2006.Google ScholarGoogle Scholar
  26. Kenneth Russell and David Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 263--272, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ryan Sciampacone. A production world view of garbage collection for the Java ME and SE spaces. http://www.mm-net.org.uk/school/programme.html.Google ScholarGoogle Scholar
  28. Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. Exploiting prolific types for memory management and optimizations. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 295--306, New York, NY, USA, 2002. ACM. ISBN 1-58113-450-9. doi: http://doi.acm.org/10.1145/503272.503300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bjarne Steensgaard. Thread-specific heaps for multithreaded programs. In ISMM '00: Proceedings of the 2nd International Symposium on Memory Management, pages 18--24, New York, NY, USA, 2000. ACM. ISBN 1-58113-263-8. doi: http://doi.acm.org/10.1145/362422.362432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sun Microsystems, Inc. Ergonomics in the 5.0 JavaTMvirtual machine. http://java.sun.com/docs/hotspot/gc5.0/ergo5.html.Google ScholarGoogle Scholar
  31. Sun Microsystems, Inc. SolarisTMmemory placement optimization and Sun FireTMservers. http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf, 2003.Google ScholarGoogle Scholar
  32. Sun Microsystems, Inc. Java SE 6 HotSpotTMvirtual machine garbage collection tuning. http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html, 2008.Google ScholarGoogle Scholar
  33. Sun Microsystems, Inc. Sun SPARC R enterprise T5440 server architecture, 2008.Google ScholarGoogle Scholar
  34. Christian Terboven, Dieter an Mey, Dirk Schmidl, Henry Jin, and Thomas Reichstein. Data and thread affinity in openmp programs. In MAW '08: Proceedings of the 2008 workshop on Memory access on future processors, pages 377--384, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-091-3. doi: http://doi.acm.org/10.1145/1366219.1366222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mustafa M. Tikir and Jeffery K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers, page 108.2, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2312-9. doi: http://dx.doi.org/10.1109/IPDPS.2005.299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mustafa M. Tikir and Jeffrey K. Hollingsworth. Using hardware counters to automatically improve memory performance. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 46, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2153-3. doi: http://dx.doi.org/10.1109/SC.2004.64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. David Ungar. Generation scavenging: A nondisruptive high performance storage reclamation algorithm. In SDE 1: Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8. doi: http://doi.acm.org/10.1145/800020. 808261. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. NUMA-aware memory manager with dominant-thread-based copying GC

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 44, Issue 10
    OOPSLA '09
    October 2009
    554 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1639949
    Issue’s Table of Contents
    • cover image ACM Conferences
      OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
      October 2009
      590 pages
      ISBN:9781605587660
      DOI:10.1145/1640089

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 October 2009

    Check for updates

    Author Tags

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader