ABSTRACT
Today, many modern processors support more than one page size. The larger pages, called superpages, have been identified as one means of reducing the time spent servicing translation lookaside buffer (TLB) misses in the early 1990s by increasing TLB reach. Widespread usage of superpages has been limited by the requirement that superpages consist of physically contiguous and naturally-aligned small pages. This makes external fragmentation a serious problem for an operating system, one that is almost non-existent when processes use only one page size. Hardware solutions to mitigate this limitation such as sub-blocking, shadow page-tables and a variety of hybrid solutions have not seen wide-spread adoption. This has curtailed automatic superpage support as it is known that superpage availability will decrease during the system's lifetime as external fragmentation grows.
This paper presents a placement policy for an operating system's physical page allocator to mitigate external fragmentation problems by grouping pages based on the system's ability to relocate the data. Secondly, the necessary changes to the page reclamation algorithm for it to be contiguity-aware are described while minimising impact to the reclamation algorithms' normal decisions. The performance impact on different machine types is illustrated and it is shown that the superpage allocation success rate is improved. These mechanisms are complementary to any of the hardware solutions proposed in the past.
- David A. Barrett and Benjamin G. Zorn. Using lifetime predictors to improve memory allocation performance. In PLDI, pages 187--196, 1993. Google ScholarDigital Library
- Hans-Juergen Boehm and Mark Weiser. Garbage collection in an uncooperative environment. Software practise and Experience, 18(9):807--820, September 1988. Google ScholarDigital Library
- Delvin C. Defoe, Sharath R. Cholleti, and Ron K. Cytron. Upper bound for defragmenting buddy heaps. ACM SIGPLAN Notices, 40(7):222--229, July 2005. Google ScholarDigital Library
- A. Demmers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining generational and conservative garbage collection: Framework and implementations. In Conference record of the 17th ACM Symposium on Principles of Programming Languages (POPL), pages 261--269, 1990. Google ScholarDigital Library
- Theodore Johnson and Dennis Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In International Conference On Very Large Data Bases (VLDB '94), pages 439--450, San Francisco, Ca., USA, September 1994. Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
- David G. Korn and Kiem-Phong Bo. In search of a better malloc. In Proceedings of the Summer 1985 USENIX Conference, pages 489--506, 1985.Google Scholar
- Mel Gorman. Understanding the Linux Virtual Memory Manager. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2004. Google ScholarDigital Library
- Mel Gorman and Andy Whitcroft. The what, the why and the where to of anti-fragmentation. In Ottawa Linux Symposium 2006 Proceedings Volume 1, pages 361--377, 2006.Google Scholar
- Marshall Kirk McKusick. The design and implementation of the 4.4BSD operating system. Addison-Wesley, 1996. Google ScholarDigital Library
- Juan E. Navarro. Transparent operating system support for superpages. PhD thesis, Rice University, Heuston, Texas, 2004. Chairman-Peter Druschel. Google ScholarDigital Library
- James L. Peterson and Theodore A. Norman. Buddy systems. Communications of the ACM, 20(6):421--431, 1977. Google ScholarDigital Library
- Brian Randell. A note on storage fragmentation and program segmentation. Commun. ACM, 12(7):365--369, 1969. Google ScholarDigital Library
- Mark R. Swanson, Leigh Stoller, and John B. Carter. Increasing TLB reach using superpages backed by shadow memory. In ISCA, pages 204--213, 1998. Google ScholarDigital Library
- Madhusudhan Talluri and Mark D. Hill. Surpassing the TLB performance of superpages with less operating system support. In ASPLOS, pages 171--182, 1994. Google ScholarDigital Library
Index Terms
- Supporting superpage allocation without additional hardware support
Recommendations
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesThe replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and designWhile set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems
Conventional on-chip TLB hierarchies are unable to fully cover the growing application working-set sizes. To make things worse, Last-Level TLB (LLT) misses require multiple accesses to the page table even with the use of page walk caches. Consequently, ...
Comments