skip to main content
10.1145/2618128.2618132acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
research-article

A study of connected object locality in NUMA heaps

Published:13 June 2014Publication History

ABSTRACT

Reference locality is vital to the performance of parallel Garbage Collection (GC) running on Non-Uniform Memory Access (NUMA) machines. A GC thread may trace remotely placed objects that descend from the root set or, for load balance, a GC thread may steal non-local objects from other threads' work lists. Processing distant live objects could introduce contention in the interconnect links between memory nodes and it could increase memory access latency. Researchers have proposed various techniques to improve GC tracing locality. However, few studies attempt to optimize the locality of connected objects in NUMA object graph. In this paper, we study the locality of a rooted subgraph, a unit of object connectivity in the object graph. A rooted subgraph is a set of references containing one root reference, and every reference is reachable from the root. We empirically study the locality of rooted subgraphs of DaCapo and SPECjbb2005 benchmark suites. The results show that more than 80% of objects in a rooted subgraph are located in the same memory node as the root object. We then propose a GC locality optimization that uses the root memory node as a heuristic to guide GC threads processing local objects.

References

  1. T. A. Anderson. Optimizations in a Private Nursery-based Garbage Collector. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 21--30, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Auhagen, L. Bergstrom, M. Fluet, and J. Reppy. Garbage Collection for Multicore NUMA Machines. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 51--57, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the Problems and Opportunities Posed by Multiple On-chip Memory Controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 319--330, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Barabash and E. Petrank. Tracing Garbage Collection on Highly Parallel Platforms. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 1--10, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The Dacapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. J. Cheney. A Nonrecursive List Compacting Algorithm. Communications of the ACM, 13(11):677--678, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chicha and S. M. Watt. A Localized Tracing Scheme Applied to Garbage Collection. Programming Languages and Systems, 4279:323--339, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local Heaps for Java. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 76--87, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pages 1--14, San Jose, CA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Eran and E. Petrank. A Study of Data Structures with a Deep Heap Shape. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 21--28, Seattle, Washington, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium, page 21, Monterey, California, 2001. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hirzel, J. Henkel, A. Diwan, and M. Hind. Understanding the Connectivity of Heap Objects. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 36--49, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 1st edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Jones and A. King. A Fast Analysis for Thread-local Garbage Collection with Dynamic Class Loading. In Fifth IEEE International Workshop on Source Code Analysis and Manipulation, pages 129--138, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. S. Lam, P. R. Wilson, and T. G. Moher. Object Type Directed Garbage Collection To Improve Locality. In Proceedings of the International Workshop on Memory Management, IWMM '92, pages 404--425, London, UK, UK, 1992. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Majo and T. R. Gross. Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead. In Proceedings of the international symposium on Memory management - ISMM '11, page 11, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Marlow and S. Peyton Jones. Multicore Garbage Collection with Local Heaps. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 21--32, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Proceedings of the 2009 International Symposium on Memory Management, ISMM '09, pages 10--19, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Ogasawara. Numa-aware Memory Manager with Dominant-thread-based Copying GC. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Shuf, M. Gupta, R. Bordawekar, and J. P. Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '02, pages 295--306, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 13--25, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SPECjbb2005. Standard Performance Evaluation Corporation. Available at: http://www.spec.org/jbb2005.Google ScholarGoogle Scholar
  25. B. Steensgaard. Thread-specific Heaps for Multi-threaded Programs. In Proceedings of the 2Nd International Symposium on Memory Management, ISMM '00, pages 18--24, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. M. Tikir and J. K. Hollingsworth. Numa-aware Java Heaps for Server Applications. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 108.2--, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. R. Wilson, M. S. Lam, and T. G. Moher. Effective "Static-graph" Reorganization to Improve Locality in Garbage-collected Systems. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI '91, pages 177--191, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Zhou and B. Demsky. Memory Management for Many-core Processors with Software Configurable Locality Policies. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 3--14, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A study of connected object locality in NUMA heaps

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness
              June 2014
              61 pages
              ISBN:9781450329170
              DOI:10.1145/2618128

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 13 June 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              MSPC '14 Paper Acceptance Rate6of20submissions,30%Overall Acceptance Rate6of20submissions,30%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader