ABSTRACT
Reference locality is vital to the performance of parallel Garbage Collection (GC) running on Non-Uniform Memory Access (NUMA) machines. A GC thread may trace remotely placed objects that descend from the root set or, for load balance, a GC thread may steal non-local objects from other threads' work lists. Processing distant live objects could introduce contention in the interconnect links between memory nodes and it could increase memory access latency. Researchers have proposed various techniques to improve GC tracing locality. However, few studies attempt to optimize the locality of connected objects in NUMA object graph. In this paper, we study the locality of a rooted subgraph, a unit of object connectivity in the object graph. A rooted subgraph is a set of references containing one root reference, and every reference is reachable from the root. We empirically study the locality of rooted subgraphs of DaCapo and SPECjbb2005 benchmark suites. The results show that more than 80% of objects in a rooted subgraph are located in the same memory node as the root object. We then propose a GC locality optimization that uses the root memory node as a heuristic to guide GC threads processing local objects.
- T. A. Anderson. Optimizations in a Private Nursery-based Garbage Collector. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 21--30, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. Auhagen, L. Bergstrom, M. Fluet, and J. Reppy. Garbage Collection for Multicore NUMA Machines. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 51--57, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the Problems and Opportunities Posed by Multiple On-chip Memory Controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 319--330, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- K. Barabash and E. Petrank. Tracing Garbage Collection on Highly Parallel Platforms. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 1--10, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The Dacapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- C. J. Cheney. A Nonrecursive List Compacting Algorithm. Communications of the ACM, 13(11):677--678, 1970. Google ScholarDigital Library
- Y. Chicha and S. M. Watt. A Localized Tracing Scheme Applied to Garbage Collection. Programming Languages and Systems, 4279:323--339, 2006. Google ScholarDigital Library
- T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local Heaps for Java. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 76--87, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pages 1--14, San Jose, CA, 1997. Google ScholarDigital Library
- H. Eran and E. Petrank. A Study of Data Structures with a Deep Heap Shape. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 21--28, Seattle, Washington, 2013. ACM. Google ScholarDigital Library
- C. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium, page 21, Monterey, California, 2001. USENIX Association. Google ScholarDigital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- M. Hirzel, J. Henkel, A. Diwan, and M. Hind. Understanding the Connectivity of Heap Objects. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 36--49, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 1st edition, 2011. Google ScholarDigital Library
- R. Jones and A. King. A Fast Analysis for Thread-local Garbage Collection with Dynamic Class Loading. In Fifth IEEE International Workshop on Source Code Analysis and Manipulation, pages 129--138, 2005. Google ScholarDigital Library
- M. S. Lam, P. R. Wilson, and T. G. Moher. Object Type Directed Garbage Collection To Improve Locality. In Proceedings of the International Workshop on Memory Management, IWMM '92, pages 404--425, London, UK, UK, 1992. Springer-Verlag. Google ScholarDigital Library
- Z. Majo and T. R. Gross. Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead. In Proceedings of the international symposium on Memory management - ISMM '11, page 11, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- S. Marlow and S. Peyton Jones. Multicore Garbage Collection with Local Heaps. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 21--32, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Proceedings of the 2009 International Symposium on Memory Management, ISMM '09, pages 10--19, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- T. Ogasawara. Numa-aware Memory Manager with Dominant-thread-based Copying GC. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Y. Shuf, M. Gupta, R. Bordawekar, and J. P. Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '02, pages 295--306, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 13--25, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- SPECjbb2005. Standard Performance Evaluation Corporation. Available at: http://www.spec.org/jbb2005.Google Scholar
- B. Steensgaard. Thread-specific Heaps for Multi-threaded Programs. In Proceedings of the 2Nd International Symposium on Memory Management, ISMM '00, pages 18--24, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- M. M. Tikir and J. K. Hollingsworth. Numa-aware Java Heaps for Server Applications. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 108.2--, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- P. R. Wilson, M. S. Lam, and T. G. Moher. Effective "Static-graph" Reorganization to Improve Locality in Garbage-collected Systems. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI '91, pages 177--191, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- J. Zhou and B. Demsky. Memory Management for Many-core Processors with Software Configurable Locality Policies. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 3--14, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
Index Terms
- A study of connected object locality in NUMA heaps
Recommendations
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsThe growing gap between processor and memory speeds is motivating the need for optimization strategies that improve data locality. A major challenge is to devise techniques suitable for pointer-intensive applications. This paper presents two techniques ...
Thread-local heaps for Java
ISMM '02: Proceedings of the 3rd international symposium on Memory managementWe present a memory management scheme for Java based on thread-local heaps. Assuming most objects are created and used by a single thread, it is desirable to free the memory manager from redundant synchronization for thread-local objects. Therefore, in ...
Creating and preserving locality of java applications at allocation and garbage collection times
The growing gap between processor and memory speeds is motivating the need for optimization strategies that improve data locality. A major challenge is to devise techniques suitable for pointer-intensive applications. This paper presents two techniques ...
Comments