research-article

Supporting superpage allocation without additional hardware support

Authors:
Mel Gorman

IBM/University of Limerick, Limerick, Ireland

IBM/University of Limerick, Limerick, Ireland
View Profile

,
Patrick Healy

University of Limerick, Limerick, Ireland

University of Limerick, Limerick, Ireland
View Profile

ISMM '08: Proceedings of the 7th international symposium on Memory managementJune 2008Pages 41–50https://doi.org/10.1145/1375634.1375641

Published:07 June 2008Publication History

ISMM '08: Proceedings of the 7th international symposium on Memory management

Pages 41–50

ABSTRACT

Today, many modern processors support more than one page size. The larger pages, called superpages, have been identified as one means of reducing the time spent servicing translation lookaside buffer (TLB) misses in the early 1990s by increasing TLB reach. Widespread usage of superpages has been limited by the requirement that superpages consist of physically contiguous and naturally-aligned small pages. This makes external fragmentation a serious problem for an operating system, one that is almost non-existent when processes use only one page size. Hardware solutions to mitigate this limitation such as sub-blocking, shadow page-tables and a variety of hybrid solutions have not seen wide-spread adoption. This has curtailed automatic superpage support as it is known that superpage availability will decrease during the system's lifetime as external fragmentation grows.

This paper presents a placement policy for an operating system's physical page allocator to mitigate external fragmentation problems by grouping pages based on the system's ability to relocate the data. Secondly, the necessary changes to the page reclamation algorithm for it to be contiguity-aware are described while minimising impact to the reclamation algorithms' normal decisions. The performance impact on different machine types is illustrated and it is shown that the superpage allocation success rate is improved. These mechanisms are complementary to any of the hardware solutions proposed in the past.

References

David A. Barrett and Benjamin G. Zorn. Using lifetime predictors to improve memory allocation performance. In PLDI, pages 187--196, 1993. Google ScholarDigital Library
Hans-Juergen Boehm and Mark Weiser. Garbage collection in an uncooperative environment. Software practise and Experience, 18(9):807--820, September 1988. Google ScholarDigital Library
Delvin C. Defoe, Sharath R. Cholleti, and Ron K. Cytron. Upper bound for defragmenting buddy heaps. ACM SIGPLAN Notices, 40(7):222--229, July 2005. Google ScholarDigital Library
A. Demmers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining generational and conservative garbage collection: Framework and implementations. In Conference record of the 17th ACM Symposium on Principles of Programming Languages (POPL), pages 261--269, 1990. Google ScholarDigital Library
Theodore Johnson and Dennis Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In International Conference On Very Large Data Bases (VLDB '94), pages 439--450, San Francisco, Ca., USA, September 1994. Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
David G. Korn and Kiem-Phong Bo. In search of a better malloc. In Proceedings of the Summer 1985 USENIX Conference, pages 489--506, 1985.Google Scholar
Mel Gorman. Understanding the Linux Virtual Memory Manager. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2004. Google ScholarDigital Library
Mel Gorman and Andy Whitcroft. The what, the why and the where to of anti-fragmentation. In Ottawa Linux Symposium 2006 Proceedings Volume 1, pages 361--377, 2006.Google Scholar
Marshall Kirk McKusick. The design and implementation of the 4.4BSD operating system. Addison-Wesley, 1996. Google ScholarDigital Library
Juan E. Navarro. Transparent operating system support for superpages. PhD thesis, Rice University, Heuston, Texas, 2004. Chairman-Peter Druschel. Google ScholarDigital Library
James L. Peterson and Theodore A. Norman. Buddy systems. Communications of the ACM, 20(6):421--431, 1977. Google ScholarDigital Library
Brian Randell. A note on storage fragmentation and program segmentation. Commun. ACM, 12(7):365--369, 1969. Google ScholarDigital Library
Mark R. Swanson, Leigh Stoller, and John B. Carter. Increasing TLB reach using superpages backed by shadow memory. In ISCA, pages 204--213, 1998. Google ScholarDigital Library
Madhusudhan Talluri and Mark D. Hill. Surpassing the TLB performance of superpages with less operating system support. In ASPLOS, pages 171--182, 1994. Google ScholarDigital Library

Index Terms

Supporting superpage allocation without additional hardware support
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Read More
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and design

While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...
Read More
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems

Conventional on-chip TLB hierarchies are unable to fully cover the growing application working-set sizes. To make things worse, Last-Level TLB (LLT) misses require multiple accesses to the page table even with the use of page walk caches. Consequently, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISMM '08: Proceedings of the 7th international symposium on Memory management
June 2008
170 pages
ISBN:9781605581347
DOI:10.1145/1375634
General Chair:
Richard Jones
University of Kent
,
Program Chair:
Steve Blackburn
Australian National University
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fragmentation
replacement policy
superpage
tlb
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate72of156submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 460
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supporting superpage allocation without additional hardware support

ISMM '08: Proceedings of the 7th international symposium on Memory management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

Location cache: a low-power L2 cache system

DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems